Randomized Algorithms
Dimension reduce with Random sketching
Given a high dimensional dataset, wants to find an embedding to reduce into a lower dimensional one, while preserving some geometry, with high probability.
More formally, An embedding for a set \(S\subseteq \mathbb R^n\) with distortion \(\epsilon\) is an \(l\times m\) matrix \(\Omega\) s.t.
A subspace embedding is an embedding for a set \(S\), where \(S\) is a \(m\)-dimensional linear subspace.
Obvious subspace embedding
Intuitively, an obvious subspace embedding is a subspace embedding at most times. Formally, a random matrix \(\Omega \in \mathbb R^{l\times m}\) is an obvious subspace embedding with parameters \(\text{OSE}(n, \epsilon, \delta)\) if with probability \(\geq 1 - \delta\), \(\Omega\) is a subspace embedding for \(S\) with distortion \(\epsilon\).
A dense Gaussian matrix \(\Omega \in \mathbb R^{l\times m}\) is a matrix whose entries are i.i.d. standard Gaussian random variables \(\omega_{ij} \sim N(0, 1)\), multiplied by \(l^{-1/2}\) where
Then, such \(\Omega\) is \(\text{OSE}(n, \epsilon, \delta)\). Which means that we can sample some matrix \(\Omega\), compute \(I\) from OSE parameters, and do the embedding.
The embedding is just a matrix matrix multiplication, thus the total parallelized cost is
Fast Johnson-Lindenstrauss Transform
The goal is to find a sparse or structured \(\Omega\), we compute a subsampled random Hadamard transform (SRHT).
Given \(m=2^q, l < m\), the SRHT ensemble embedding \(\mathbb R^m\) into \(\mathbb R^l\) is defined as
- \(D\in \mathbb R^{m\times m}\) is a diagonal matrix whose diagonals are uniformly random signs \(\pm 1\).
- \(H\in\mathbb{R}^{m\times m}\) is the normalized Walsh-Hadamard transform.
- \(P\in \mathbb{R}^{l\times m}\) is the random choose matrix to draw \(l\) rows from \(HD\).
The normalized Walsh-Hadamard transform is defined that for given \(m=2^q\), define the (unnormalized) WH transform as
and the normalization is \(H = m^{-1/2}H_m\).
SRHT is \(\text{OSE}(n,\epsilon, \delta)\) with \(l = O(\epsilon^{-2}(n+\lg \frac{m}{\delta}) ln \frac{n}{\delta})\).
Note that because of the special structure of the matrix, the actual computation time is much reduced, the parallelized cost is
Problems using random sketching
Least squares problems
Given \(A\in\mathbb R^{m\times n}\) and \(b\in\mathbb R^n\) with \(n << m\)$, aims to solve
One (deterministic) approach for this problem is to do QR factorization on \(A\), or we can solve by random sketching with \(\Omega \in \mathbb R^{l\times m}\)
Can be shown that with probability \(\delta\), we have
Low rank matrix approximation
For a matrix \(A\in\mathbb R^{m\times n}\), decompose \(A\) into lower rank approximations \(Z\cdot W^T\) where \(Z\in \mathbb R^{m\times k}, W^T \in \mathbb R^{k\times n}\). The low rank approximation is commonly used in many problems, as the number of flops is significantly reduced (\(2mn\) vs. \(2(m+n)k\)).
Singular value decomposition
One of the most common way for matrix approximation \(A = U\Lambda V^T\), a rank-\(k\) approximation of \(A\) takes the first \(k\) singular vectors \(A_k = U_k \Sigma_k V_k^T\) where \(U_k\in\mathbb R^{m\times k}, V_k^T \in\mathbb R^{k\times n}\) is the first \(k\) columns of \(U, V\) respectively, \(\Sigma_k\in\mathbb R^{k\times k}\) is the first \(k\) singular values.
Rank Revealing QR factorization
Given \(A \in\mathbb R^{m\times n}\), consider the QR decomposition
where we choose \(k\) and a column permutation \(\Pi_c\) so that
- \(R_{11} \in\mathbb R^{k\times k}\) is well-conditioned
- \(\|R_{22}\|\) is small
Then, we can have a rank-\(k\) approximation of \(A_{qr}\) as
and we have the error bound \(\|A - A_{qr}\|_2 = \|R_{22}\|\), which is small by construction.
QR with column pivoting (QRCP)
Based on the desired construction, the algorithm is
def QRCP(A):
m, n = A.shape
Q = identity()
colnorm = A.norm(axis=1) # column-wise norm of size n
for j in range(n):
p = argmax(colnorm) # the column with largest norm
if colnorm[p] > eps:
# pivoting: swap columns of A and colnorm
A[:, j], A[:, p] = A[:, p], A[:, j]
colnorm[j], colnorm[p] = colnorm[p], colnorm[j]
# reduction: determine Householder matrix Hj
Hj = Householder(A) # Hj @ A[j:, j] = +/- norm(A[j:, j]) @ e1
# update
A[j:, j+1:] = Hj @ A[j:, j+1:]
colnorm[j+1:] = colnorm[j+1:] - A[j, j+1:] ** 2
Q = Q @ Hj
else:
break
return (Q, A) # Q = Hj * ... H1, R = A