Skip to content

Randomized Algorithms

Dimension reduce with Random sketching

Given a high dimensional dataset, wants to find an embedding to reduce into a lower dimensional one, while preserving some geometry, with high probability.

More formally, An embedding for a set \(S\subseteq \mathbb R^n\) with distortion \(\epsilon\) is an \(l\times m\) matrix \(\Omega\) s.t.

\[\forall x \in S. (1 - \epsilon) \|x\|^2 \leq \|\Omega x\|^2 \leq (1 + \epsilon)\|x\|^2\]

A subspace embedding is an embedding for a set \(S\), where \(S\) is a \(m\)-dimensional linear subspace.

Obvious subspace embedding

Intuitively, an obvious subspace embedding is a subspace embedding at most times. Formally, a random matrix \(\Omega \in \mathbb R^{l\times m}\) is an obvious subspace embedding with parameters \(\text{OSE}(n, \epsilon, \delta)\) if with probability \(\geq 1 - \delta\), \(\Omega\) is a subspace embedding for \(S\) with distortion \(\epsilon\).

A dense Gaussian matrix \(\Omega \in \mathbb R^{l\times m}\) is a matrix whose entries are i.i.d. standard Gaussian random variables \(\omega_{ij} \sim N(0, 1)\), multiplied by \(l^{-1/2}\) where

\[l = O(\epsilon^{-2}(n+ \log(\frac{1}{\epsilon})))\]

Then, such \(\Omega\) is \(\text{OSE}(n, \epsilon, \delta)\). Which means that we can sample some matrix \(\Omega\), compute \(I\) from OSE parameters, and do the embedding.

The embedding is just a matrix matrix multiplication, thus the total parallelized cost is

\[O(\gamma(2mnl / p) + \alpha(\lg p)) + \beta(ln \lg p)\]

Fast Johnson-Lindenstrauss Transform

The goal is to find a sparse or structured \(\Omega\), we compute a subsampled random Hadamard transform (SRHT).

Given \(m=2^q, l < m\), the SRHT ensemble embedding \(\mathbb R^m\) into \(\mathbb R^l\) is defined as

\[\Omega = \sqrt{\frac{m}{l}} P H D\]
  • \(D\in \mathbb R^{m\times m}\) is a diagonal matrix whose diagonals are uniformly random signs \(\pm 1\).
  • \(H\in\mathbb{R}^{m\times m}\) is the normalized Walsh-Hadamard transform.
  • \(P\in \mathbb{R}^{l\times m}\) is the random choose matrix to draw \(l\) rows from \(HD\).

The normalized Walsh-Hadamard transform is defined that for given \(m=2^q\), define the (unnormalized) WH transform as

\[H_2 = \begin{bmatrix}1&1\\1&-1\end{bmatrix}, H_m = \begin{bmatrix}H_{m/2}&H_{m/2}\\H_{m/2}&-H_{m/2}\end{bmatrix}\]

and the normalization is \(H = m^{-1/2}H_m\).

SRHT is \(\text{OSE}(n,\epsilon, \delta)\) with \(l = O(\epsilon^{-2}(n+\lg \frac{m}{\delta}) ln \frac{n}{\delta})\).

Note that because of the special structure of the matrix, the actual computation time is much reduced, the parallelized cost is

\[O(\gamma(2mn\lg m / p) + \alpha(\lg p)) + \beta(mn \lg p)\]

Problems using random sketching

Least squares problems

Given \(A\in\mathbb R^{m\times n}\) and \(b\in\mathbb R^n\) with \(n << m\)$, aims to solve

\[y:= \arg\min_{x\in\mathbb R^n} \|Ax - b\|_2\]

One (deterministic) approach for this problem is to do QR factorization on \(A\), or we can solve by random sketching with \(\Omega \in \mathbb R^{l\times m}\)

\[y^* := \arg\min_{x\in\mathbb R^n}\|\Omega(Ax - b)\|_2\]

Can be shown that with probability \(\delta\), we have

\[\|Ay^* - b\|_2^2 \leq (1+O(\epsilon))\|Ay-b\|_2^2\]

Low rank matrix approximation

For a matrix \(A\in\mathbb R^{m\times n}\), decompose \(A\) into lower rank approximations \(Z\cdot W^T\) where \(Z\in \mathbb R^{m\times k}, W^T \in \mathbb R^{k\times n}\). The low rank approximation is commonly used in many problems, as the number of flops is significantly reduced (\(2mn\) vs. \(2(m+n)k\)).

Singular value decomposition

One of the most common way for matrix approximation \(A = U\Lambda V^T\), a rank-\(k\) approximation of \(A\) takes the first \(k\) singular vectors \(A_k = U_k \Sigma_k V_k^T\) where \(U_k\in\mathbb R^{m\times k}, V_k^T \in\mathbb R^{k\times n}\) is the first \(k\) columns of \(U, V\) respectively, \(\Sigma_k\in\mathbb R^{k\times k}\) is the first \(k\) singular values.

Rank Revealing QR factorization

Given \(A \in\mathbb R^{m\times n}\), consider the QR decomposition

\[A\Pi_c = QR = \begin{bmatrix}Q_1&Q_2\end{bmatrix} \begin{bmatrix}R_{11}&R_{12}\\&R_{22}\end{bmatrix}\]

where we choose \(k\) and a column permutation \(\Pi_c\) so that

  • \(R_{11} \in\mathbb R^{k\times k}\) is well-conditioned
  • \(\|R_{22}\|\) is small

Then, we can have a rank-\(k\) approximation of \(A_{qr}\) as

\[A_{qr} = Q_1 \begin{bmatrix}R_{11}&R_{12}\end{bmatrix} \Pi_c^T = Q_1Q_1^T A =: \mathcal P^\circ A\]

and we have the error bound \(\|A - A_{qr}\|_2 = \|R_{22}\|\), which is small by construction.

QR with column pivoting (QRCP)

Based on the desired construction, the algorithm is

QR with column pivoting
def QRCP(A):
    m, n = A.shape
    Q = identity()
    colnorm = A.norm(axis=1) # column-wise norm of size n
    for j in range(n):
        p = argmax(colnorm) # the column with largest norm
        if colnorm[p] > eps:
            # pivoting: swap columns of A and colnorm
            A[:, j], A[:, p] = A[:, p], A[:, j]
            colnorm[j], colnorm[p] = colnorm[p], colnorm[j]
            # reduction: determine Householder matrix Hj
            Hj = Householder(A) # Hj @ A[j:, j] = +/- norm(A[j:, j]) @ e1
            # update
            A[j:, j+1:] = Hj @ A[j:, j+1:]
            colnorm[j+1:] = colnorm[j+1:] - A[j, j+1:] ** 2
            Q = Q @ Hj
        else:
            break
    return (Q, A) # Q = Hj * ... H1, R = A