# 1. PageRank

$$PR(p_i) = \frac{1-d}{N} + d \sum_{p_j \in B(p_i)} \frac{PR(p_j)}{L(p_j)}$$

Fig. 1: PageRanks for a simple network (image from Wikipedia).

Fig. 2: Draw Fig. 1 in NetworkX.

# 2. 代数方法

$$\begin{bmatrix} PR(p_1) \\ PR(p_2) \\ \vdots \\ PR(p_N) \end{bmatrix} = \begin{bmatrix} {(1-d)/ N} \\ {(1-d) / N} \\ \vdots \\ {(1-d) / N} \end{bmatrix} + d \begin{bmatrix} \ell(p_1,p_1) & \ell(p_1,p_2) & \cdots & \ell(p_1,p_N) \\ \ell(p_2,p_1) & \ddots & & \vdots \\ \vdots & & \ell(p_i,p_j) & \\ \ell(p_N,p_1) & \cdots & & \ell(p_N,p_N) \end{bmatrix} \cdot \begin{bmatrix} PR(p_1) \\ PR(p_2) \\ \vdots \\ PR(p_N) \end{bmatrix}$$

$$\mathbf{R} = \frac{1-d}{N} \mathbf{1} + d \mathcal{M}\mathbf{R}$$

N = len(G.nodes())      # N = 11
column_vector = np.ones((N, 1), dtype=np.int)

[[1]
[1]
[1]
[1]
[1]
[1]
[1]
[1]
[1]
[1]
[1]]


$$\mathcal{M}_{ij} = \ell(p_i,p_j) = \begin{cases} 1 /L(p_j) , & \mbox{if }j\mbox{ links to }i. L(p_j)\mbox{是指从}p_j\mbox{链出去的网页数目} \\ 0, & \mbox{otherwise} \end{cases}$$

$$\begin{split} M_{2j} & = \ell(p_2,p_1) \cdot PR(p_2) + \ell(p_2,p_2) \cdot PR(p_2) + \cdots + \ell(p_2,p_N) \cdot PR(p_2) \\ &= 0 \mbox{ (‘A’)} + 0 \mbox{ (‘B’)} + 1 \mbox{ (‘C’)} + \frac{1}{2} \mbox{ (‘D’)} + \frac{1}{3} \mbox{ (‘E’)} + \frac{1}{2} \mbox{ (‘F’)} \\ & + \frac{1}{2} \mbox{ (‘G’)} + \frac{1}{2} \mbox{ (H’)} + \frac{1}{2} \mbox{ (‘I’)} + 0 \mbox{ (‘J’)} + 0 \mbox{ (‘K’)} \end{split}$$

## 2.2 矩阵$\mathcal{M}$

$$\mathcal{M} = (K^{-1} A)^T$$

# Get adjacency matrix
nodelist = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K']  # sorted(G.nodes())
A = nx.to_numpy_matrix(G, nodelist)

'A' 'B' 'C' 'D' 'E' 'F' 'G' 'H' 'I' 'J' 'K'
[[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
[ 0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.]
[ 0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
[ 1.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
[ 0.  1.  0.  1.  0.  1.  0.  0.  0.  0.  0.]
[ 0.  1.  0.  0.  1.  0.  0.  0.  0.  0.  0.]
[ 0.  1.  0.  0.  1.  0.  0.  0.  0.  0.  0.]
[ 0.  1.  0.  0.  1.  0.  0.  0.  0.  0.  0.]
[ 0.  1.  0.  0.  1.  0.  0.  0.  0.  0.  0.]
[ 0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.]
[ 0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.]]


$\mathcal{A}$是对角矩阵，对角线上的元素是对应节点的出度。

nodelist = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K']  # sorted(G.nodes())
list_outdegree = map(operator.itemgetter(1), sorted(G.out_degree().items()))
K = np.diag(list_outdegree)

[[0 0 0 0 0 0 0 0 0 0 0]
[0 1 0 0 0 0 0 0 0 0 0]
[0 0 1 0 0 0 0 0 0 0 0]
[0 0 0 2 0 0 0 0 0 0 0]
[0 0 0 0 3 0 0 0 0 0 0]
[0 0 0 0 0 2 0 0 0 0 0]
[0 0 0 0 0 0 2 0 0 0 0]
[0 0 0 0 0 0 0 2 0 0 0]
[0 0 0 0 0 0 0 0 2 0 0]
[0 0 0 0 0 0 0 0 0 1 0]
[0 0 0 0 0 0 0 0 0 0 1]]


$\mathbf{K}$的逆矩阵$\mathbf{K^{-1}}$为，

K_inv = np.linalg.pinv(K)

[[ 0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.  ]
[ 0.    1.    0.    0.    0.    0.    0.    0.    0.    0.    0.  ]
[ 0.    0.    1.    0.    0.    0.    0.    0.    0.    0.    0.  ]
[ 0.    0.    0.    0.5   0.    0.    0.    0.    0.    0.    0.  ]
[ 0.    0.    0.    0.    0.33  0.    0.    0.    0.    0.    0.  ]
[ 0.    0.    0.    0.    0.    0.5   0.    0.    0.    0.    0.  ]
[ 0.    0.    0.    0.    0.    0.    0.5   0.    0.    0.    0.  ]
[ 0.    0.    0.    0.    0.    0.    0.    0.5   0.    0.    0.  ]
[ 0.    0.    0.    0.    0.    0.    0.    0.    0.5   0.    0.  ]
[ 0.    0.    0.    0.    0.    0.    0.    0.    0.    1.    0.  ]
[ 0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    1.  ]]


M = (K_inv * A).transpose()

[[ 0.    0.    0.    0.5   0.    0.    0.    0.    0.    0.    0.  ]
[ 0.    0.    1.    0.5   0.33  0.5   0.5   0.5   0.5   0.    0.  ]
[ 0.    1.    0.    0.    0.    0.    0.    0.    0.    0.    0.  ]
[ 0.    0.    0.    0.    0.33  0.    0.    0.    0.    0.    0.  ]
[ 0.    0.    0.    0.    0.    0.5   0.5   0.5   0.5   1.    1.  ]
[ 0.    0.    0.    0.    0.33  0.    0.    0.    0.    0.    0.  ]
[ 0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.  ]
[ 0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.  ]
[ 0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.  ]
[ 0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.  ]
[ 0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.  ]]


## 2.3 求解

$\mathbf{R}$是2.1等式的特征向量（eigenvector），求解等式得：

$$\mathbf{R} = (\mathbf{I}-d \mathcal{M})^{-1} \frac{1-d}{N} \mathbf{1}$$

d = 0.85
I = np.identity(N)
R = np.linalg.pinv(I - d*M) * (1-d)/N * column_vector

[[ 0.028]
[ 0.324]
[ 0.289]
[ 0.033]
[ 0.068]
[ 0.033]
[ 0.014]
[ 0.014]
[ 0.014]
[ 0.014]
[ 0.014]]


R = R/sum(R)    # normalized R, so that page ranks sum to 1.

[[ 0.033]
[ 0.384]
[ 0.343]
[ 0.039]
[ 0.081]
[ 0.039]
[ 0.016]
[ 0.016]
[ 0.016]
[ 0.016]
[ 0.016]]


Fig. 3: PageRanks for a simple network

# 3. Python源代码

NetworkX实现了PageRank的代数计算方法nx.pagerank_numpy，源代码在这里

def pagerank_numpy(G, alpha=0.85, personalization=None, weight='weight', dangling=None):
"""Return the PageRank of the nodes in the graph.
"""

if len(G) == 0:
return {}

weight=weight, dangling=dangling)

# use numpy LAPACK solver
eigenvalues, eigenvectors = np.linalg.eig(M.T)
ind = eigenvalues.argsort()

# eigenvector of largest eigenvalue at ind[-1], normalized
largest = np.array(eigenvectors[:, ind[-1]]).flatten().real
norm = float(largest.sum())

return dict(zip(G, map(float, largest / norm)))
`

References:
[1] StackOverflow: Incorrect PageRank calculation result

# 专题: 网页排序算法PageRank (3/3)

### One thought on “代数方法求PageRank”

• 2019年01月24日 星期四 at 01:03下午