质数入门:质数判定、质数生成、质因数分解

因为了解到有限域,想进一步了解质数,简单做一个笔记,方便日后深入了解。

1. 概述

质数(prime,也叫素数)定义很简单:在大于1的自然数中,除了1和自身外,无法被其他自然数整除的数。但研究质数并不容易。质数的重要性在于[2]

  • 是所有数的基础(Primes are building blocks of all numbers.)
  • 有特殊的性质,比如判断一个数是否质数很困难,可用于密码学

值得注意的是,1不是质数,这样的规定是为了跟算术基本原理(fundamental theorem of arithmetic)自洽。算术基本原理的内容是任何大于1的自然数均可写成质数的乘积(存在性),并且唯一(不考虑顺序的话)。如果1是质数的话,那么就会破坏算术基本原理的唯一性(想想3x13x1x1)。证明见维基百科的算术基本定理

有无穷多个质数。有多种证明方法,包括欧几里得在《几何原本》给出的反证法,欧拉利用黎曼函数证明,哈里·弗斯滕伯格使用拓扑学证明。

2. 素数判定

素数判定(primality tests),判断一个整数是否为质数。最直观的方法,根据定义,逐一判断从2到\(n-1\)是否是\(n\)的因数divisor(只要有,\(n\)就是合数),但这种方法效率最低,需要\(\mathcal{O}(n)\)。

事实上,只需要检查\([2, \left \lfloor{\sqrt{n}}\right \rfloor]\)就可以了。这是因为,如果\(n\)有一个大于\(\sqrt{n}\)的因数\(q\),那么有\(n = p \cdot q\)(这些数的大小关系:\(1, p, \sqrt{n}, q, n\)),这说明\(p\)也是\(n\)的一个因数,那么检查到\(p\)的时候,就可以确认\(n\)不是质数了。这种方法叫做试除法(Trial division)。

还可以进一步提升效率。比如,整数的末位是0(分解等式一定存在2x5);大于2的数,只需要检查奇数就可以了,因为偶数必有一个因子是2;再者,把质数事先存起来,方便查找,以下是质数的前100个。

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 
73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 
157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 
239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 
331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 
421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503,
509, 521, 523, 541]

除了试除法,还有很多更复杂的判定方法,如下:

3. 质数生成

Prime-generating

In computational number theory, a variety of algorithms make it possible to generate prime numbers efficiently. These are used in various applications, for example hashingpublic-key cryptography, and search of prime factors in large number.

筛选法(prime sieve)是一类快速找到质数的算法。最简单的是埃拉托斯特尼筛法(sieve of Eratosthenes,公元前250s就提出了),其思路很简单:从最小的质数2开始,把所有2的倍数标记为合数,接着下一个未被标记的数3,再将3的倍数全部标记为合数,如此遍历下去,就可以得到一系列质数。下图很直观说明了筛选法的原理:

Sieve_of_Eratosthenes_animation
Fig. 1: Algorithm steps for the sieve of Eratosthenes (image from Wikipedia).

筛选法有多种变种,如下:

4. 质数的分布

目前,并不知道质数的确切分布,不过可以估计。数学家找到一些函数来估计素数个数(称为素数计数函数\(\pi{n}\)),其中最简单的是:

$$
\pi(n) \sim \frac{n}{\ln(n)}
$$

举例如下:

>>> import math
>>> 1000 / math.log(1000)                   # pi(n),1000以内的质数约有144个
144.76482730108395

>>> import sympy
>>> len(list(sympy.primerange(1, 1000)))    # 实际上,1000以内有168个质数
168

这些描述质数大致分布情况的函数被称为素数定理(Prime number theorem, PNT)。

The prime number theorem (PNT) describes the asymptotic distribution of the prime numbers among the positive integers. It formalizes the intuitive idea that primes become less common as they become larger by precisely quantifying the rate at which this occurs.

素数定理可以给出第n个素数\(p_n\)的渐近估计:

$$
p_n \sim n \ln n
$$

举例如下:

>>> import math
>>> 1000 * math.log(1000)   # 第1000个素数大概在6907附近
6907.755278982137

>>> import sympy
>>> sympy.prime(1000)       # 第1000个素数是7919
7919

5. 质因数分解

质因数分解(prime factorization,也叫整数分解Integer factorization)将一个整数分解成一系列质数的乘积(算术基本原理保证了分解的唯一性)。最直观的方法是将整数反复除以质数(从2开始,3, 5, 7, 11, …),直到商为质数,这种方法叫短除法(short division)。

除此之外,还有很多整数分解Integer factorization的方法,如下:

SymPy提供两个接口返回整数分解的结果,

使用举例如下:

>>> import sympy

>>> sympy.primefactors(1000)    # Return a sorted list of n’s prime factors, ignoring multiplicity 
[2, 5]

>>> sympy.factorint(1000)       # Returns a dict containing the prime factors of n as keys and their respective multiplicities as values.
{2: 3, 5: 3}

>>> pow(2, 3) * pow(5, 3)
1000

6. 程序库

6.1 SymPy

SymPy是一个符号计算的Python库,其目标是成为一个功能全面的计算机代数系统。摘抄介绍如下[4]

SymPy is a Python library for symbolic mathematics. It aims to become a full-featured computer algebra system (CAS) while keeping the code as simple as possible in order to be comprehensible and easily extensible. SymPy is written entirely in Python.

SymPy includes features ranging from basic symbolic arithmetic to calculus, algebra, discrete mathematics and quantum physics. It is capable of formatting the result of the computations as LaTeX code.

SymPy提供一些质数操作的接口,如下。

isprime(n)              # Test if n is a prime number (True) or not (False).

primerange(a, b)        # Generate a list of all prime numbers in the range [a, b).
randprime(a, b)         # Return a random prime number in the range [a, b).
primepi(n)              # Return the number of prime numbers less than or equal to n.

prime(nth)              # Return the nth prime, with the primes indexed as prime(1) = 2. The nth prime is approximately n*log(n) and can never be larger than 2**n.
prevprime(n, ith=1)     # Return the largest prime smaller than n
nextprime(n)            # Return the ith prime greater than n

sieve.primerange(a, b)  # Generate all prime numbers in the range [a, b), implemented as a dynamically growing sieve of Eratosthenes. 

使用举例如下:

>>> import sympy
>>> 
>>> sympy.isprime(5)
True
>>> list(sympy.primerange(0, 100))
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]
>>> sympy.randprime(0, 100)
41
>>> sympy.prime(3)
5
>>> sympy.prevprime(50)
47
>>> sympy.nextprime(50)
53
>>> list(sympy.sieve.primerange(0, 100))
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]

详情见SymPy documentation: number theory functions

6.2 SageMath

SageMath is builds on top of many existing open-source packages: NumPySciPymatplotlibSymPyMaximaGAPFLINTR and many more. Access their combined power through a common, Python-based language or directly via interfaces or wrappers.

SageMath文档在Sage Documentation。关于SymPy与SageMath的区别见:StackOverflow: What is the difference between sympy and sage?

References:
[1] Wikipedia: Prime number
[2] BetterExplained: Another Look at Prime Numbers
[3] Why is the number one not prime?
[4] Wikipedia: SymPy
[5] Blogger: 求质数算法的 N 种境界[1] – 试除法和初级筛法
[6] OEIS, On-Line Encyclopedia of Integer Sequences

发表评论

电子邮件地址不会被公开。 必填项已用*标注