Linear Filters

Images

Denote a image (matrix of integer values) as \(I\)
For gray-scale images, \(I_{m\times n}\); where \(I(i,j)\) is called intensity
For color images, \(I_{m\times n\times \{1,2,3\}}\), 3 for RGB values

Alternatively, think of a grayscale image as a mapping \(I:\mathbb N^2 \rightarrow \{0,1,...,255\}\), i.e. position \((i,j)\rightarrow\) gray-scale, where 0 is black, 255 is white

Example For image \(I(i,j)\); \(I(i,j)+50\) lighten the image, \(I(i,-j)\) rotate the image horizontally

Image filters

Modify the pixels in an image based on some function of a local neighborhood of each pixel

Can be used to enhance (denoise), detect patterns (matching), extract information (texture, edges)

Boundary Effects

Consider the boundary of the image, there are three modes: full, same, valid

Correlation

Given a image \(I\), the filtered image is

\[G(i,j) = \sum_{u=-k}^k \sum_{v=-k}^k F(u,v)\cdot I(i+u, j + v)\]

The entries of the weight kernel of mask \(F(u,v)\) are often called the filter coefficients
Denote this correlation operator \(F\otimes I\)

OpenCV cv2.filter2d

We can also write correlation in a more compact form using vectors, i.e.

Let \(\mathbf f = F\) be the kernel matrix, \(\mathbf t_{ij} = T_{i,j}(I_{\{i-k, i+k\} \times \{j-k, j+k\}})\) be the matrix of the image centered at \((i,j)\) and of size \(2k + 1\), then \(G(i,j)=\mathbf f\cdot \mathbf t_{ij}\)

And the full image

\[\mathbf G_{m\times n} = \begin{bmatrix} G(0,0) &G(0, 1)&\cdots &G(0, n) \\ G(1,0) &G(1, 1)&\cdots &G(1, n) \\ \vdots &\vdots&\ddots &\vdots \\ G(m,0) &G(m, 1)&\cdots &G(m, n) \\ \end{bmatrix}\]

Example of Filter kernels

Unchanged

\(\begin{bmatrix} 0 &0 &0 \\ 0 &1 &0 \\ 0 &0 &0 \end{bmatrix}\)

shifting to the left by 1px

\(\begin{bmatrix} 0 &0 &0 \\ 0 &0 &1 \\ 0 &0 &0 \end{bmatrix}\)

# shift down by 25px
k_shifting = np.zeros((50, 50))
k_shifting[0, 24] = 1

Sharpening

\[M1 = \begin{bmatrix} 0 &0 &0 \\ 0 &2 &0 \\ 0 &0 &0 \end{bmatrix} - \frac{1}{9} \begin{bmatrix} 1 &1 &1 \\ 1 &1 &1 \\ 1 &1 &1 \end{bmatrix}\]

To enhance the effect, using a larger center value

\[M2 = \begin{bmatrix} 0 &0 &0 \\ 0 &5 &0 \\ 0 &0 &0 \end{bmatrix} - \frac{4}{9} \begin{bmatrix} 1 &1 &1 \\ 1 &1 &1 \\ 1 &1 &1 \end{bmatrix}\]

If we enlarge the matrix to 5

\[M3 = \vec 0_{5\times 5} + [5]_{center} - \frac{4}{25} J_{5\times 5}\]

k_sharp_3_2 = - np.ones((3,3), np.float32) / 9
k_sharp_3_2[1,1] = 2
k_sharp_3_2 = k_sharp_3_2 

k_sharp_3_5 = - np.ones((3,3), np.float32) *4 / 9
k_sharp_3_5[1,1] = 5

k_sharp_5_5 = - np.ones((5,5), np.float32) *4 / 25
k_sharp_5_5[2,2] = 5

Smoothing

Smoothing 1: Moving Averaging filter

Simplest thing is to replace each pixel by the average of its neighbors, i.e. \(F\) will be a \((2k+1)^{-2} J_{(2k+1)\times (2k+1)}\) matrix

Assumption neighboring pixels are similar, and noise is independent of pixels.

For example, the following uses \(\frac{1}{9}J_{3\times 3}\) and \(\frac{1}{25}J_{5\times 5}\), note that the matrix should add up to one, hence the image is normalized.

k_3 = np.ones((3,3),np.float32) / 9
k_5 = np.ones((5,5),np.float32) / 25
k_11 = np.ones((11,11),np.float32) / 121

Smoothing 2: Gaussian Filter

When we want nearest neighboring pixels to have the most influence on the output, which can removes high-frequency components from the image (low-pass filter).

For example,

\[F = \frac{1}{16} \begin{bmatrix} 1 &2 &1 \\ 2 &4 &2 \\ 1 &2 &1 \end{bmatrix}\]

The Gaussian kernel is an approximation of a 2d Gaussian function

\[h_{\sigma}(u,v)=\frac{1}{2\pi \sigma^2}\exp(-\frac{u^2 + v^2}{\sigma^2})\]

# simple version of a square Gaussian function
# this filter is equivalent to cv2.GaussianBlur(img, (size, size), sigma)
def Gaussian(size, sigma):
    coef = 1 / (2 * np.pi * sigma **2)
    gaus = np.fromfunction(lambda x, y: coef * np.e \
                           ** ((-1*((x-(size-1)/2)**2+(y-(size-1)/2)**2))/(2*sigma**2)), (size, size))
    return gaus / np.sum(gaus)

A more general form of the Gaussian kernel is obtained by the multi-variante Gaussian distribution \(\mathcal N(\mu, \Sigma)\), where

\[P(\vec x) = \frac{1}{(2\pi)^{d/2} |\Sigma|^{1/2}} \exp(-\frac{(\vec x - \mu)^T \Sigma^{-1}(\vec x - \mu)}{2})\]

Properties of Smoothing

All values are positive
The kernel sums up to 1 to prevent re-scaling of the image
Remove high-frequency components (edges); low-pass filter

Filtering image to find image crop (Normalized cross-correlation)

Let \(\mathbf f\) be a image crop, \(\mathbf t\) be the original image, then

\[G(i,j) = \frac{\mathbf f^T \mathbf t_{ij}}{\|\mathbf f^T \|\|\mathbf t_{ij}\|}\]

is the "normalized score" \((0\sim 1)\), where \(1\) indicates the matching

Consider the following image crop

def findMatch(crop, target):
    target = cv2.cvtColor(target, cv2.COLOR_BGR2GRAY).astype(float)
    crop = cv2.cvtColor(crop, cv2.COLOR_BGR2GRAY).astype(float)
    H_crop, W_crop = crop.shape[:2]
    ave_kernel = np.ones((H_crop, W_crop)) / (H_crop * W_crop)
    kNorm = (np.sum(crop * crop)) ** 0.5
    targetNorm = np.sqrt(cv2.filter2D(target * target, -1, ave_kernel) * (H_crop * W_crop))
    conv = cv2.filter2D(target, -1, crop / kNorm)
    return conv / targetNorm

Convolution

A convolution \(F*I\) is equivalent to flip \(F\) along the diagonal and apply correlation. i.e.

\[G(i,j) = \sum_{u=-k}^k\sum_{v={-k}}^k F(u,v)\cdot I(i-u, j-v)\]

Obviously, for a symmetric filter matrix, convolution and correlation will do the same

Convolution is the natural linear feature, and it is

commutative \(f*g = g*f\),
associative \(f*(g*h) = (f*g)*h\),
distributive \(f*(g+h)=f*g + f*h\)
associative with scalar \(\lambda (f*g)=(\lambda f)*g\)
The Fourier transform of two convolved images is the product of their individual Fourier transforms \(\mathcal F(f*g) =\mathcal F(f)\cdot \mathcal F(g)\)

Separable Filters

A convolution filter is separable if it can be written as the outer product of two 1D filters. i.e. \(F = vh^T\), then \(F*I = v*(h*I)\) by associative property.

Example

\[ k^{-2} J = k^{-1}[1 \:1 \:... \:1]^T k^{-1}[1 \:1 \:... \:1]\]

\[ \frac{1}{16} \begin{bmatrix} 1 &2 &1 \\ 2 &4 &2 \\ 1 &2 &1 \end{bmatrix} = \frac{1}{4} \begin{bmatrix} 1\\ 2\\ 1 \end{bmatrix} \frac{1}{4} \begin{bmatrix} 1&2&1 \end{bmatrix}\]

\[ \begin{bmatrix} -1 &0 &1 \\ -2 &0 &2 \\ -1 &0 &1 \end{bmatrix} = \begin{bmatrix} 1\\ 2\\ 1 \end{bmatrix} \begin{bmatrix} -1&0&1 \end{bmatrix}\]

kernel = np.array([[-1, 0, 1], 
                   [-2, 0, 2], 
                   [-1, 0, 1]])
kernel_h = np.array([[-1, 0, 1]])
kernel_v = np.array([[1, 2, 1]]).T
# note that images are in uint8
# and the 1D filters are not normalized
img_gray = img_gray.astype(float)
img_k = cv2.filter2D(img_gray, -1, kernel)
img_k1 = cv2.filter2D(img_gray, -1, kernel_h)
img_k2 = cv2.filter2D(img_k1, -1, kernel_v)

print(np.max(np.abs(img_k - img_k2)))
#>> 0.0

How to tell is separable

Quickcheck it has rank 1 (otherwise it cannot be written as 2 1D array)

Singular value decomposition (SVD) decompose by \(F = \mathbf U \Sigma \mathbf V^T = \sum_i \sigma_i u_i v_i^T\) with \(\Sigma = diag(\sigma_i)\) if only one singular value \(\sigma_i\) is non-zero, then it is separable and \(\sqrt{\sigma_1}\mathbf u, \sqrt{\sigma_1} \mathbf v_1^T\) are the 1D filters

Source code

import cv2
import numpy as np
from scipy.signal import convolve2d

# load as gray scale
img_gray = cv2.imread("../assets/yurina.jpg", cv2.IMREAD_GRAYSCALE)
# load as colored
img = cv2.imread("../assets/yurina.jpg")

# --8<-- [start:shift]
# shift down by 25px
k_shifting = np.zeros((50, 50))
k_shifting[0, 24] = 1
# --8<-- [end:shift]
img_shifting = cv2.filter2D(img, -1, k_shifting, borderType=cv2.BORDER_CONSTANT)
cv2.imwrite("../assets/filter_shift.jpg", np.hstack((img, img_shifting)))

# --8<-- [start:sharpen]
k_sharp_3_2 = - np.ones((3,3), np.float32) / 9
k_sharp_3_2[1,1] = 2
k_sharp_3_2 = k_sharp_3_2 

k_sharp_3_5 = - np.ones((3,3), np.float32) *4 / 9
k_sharp_3_5[1,1] = 5

k_sharp_5_5 = - np.ones((5,5), np.float32) *4 / 25
k_sharp_5_5[2,2] = 5
# --8<-- [end:sharpen]

img_sharp_3_2 = cv2.filter2D(img, -1, k_sharp_3_2)
img_sharp_3_5 = cv2.filter2D(img, -1, k_sharp_3_5)
img_sharp_5_5 = cv2.filter2D(img, -1, k_sharp_5_5)

cv2.imwrite("../assets/filter_sharpen.jpg", np.hstack((img_sharp_3_2, img_sharp_3_5, img_sharp_5_5)))

# --8<-- [start:average]
k_3 = np.ones((3,3),np.float32) / 9
k_5 = np.ones((5,5),np.float32) / 25
k_11 = np.ones((11,11),np.float32) / 121
# --8<-- [end:average]

img_3 = cv2.filter2D(img,-1,k_3)
img_5 = cv2.filter2D(img,-1,k_5)
img_11 = cv2.filter2D(img,-1,k_11)

cv2.imwrite("../assets/filter_ave.jpg", np.hstack((img_3, img_5, img_11)))

# --8<-- [start:gauss]
# simple version of a square Gaussian function
# this filter is equivalent to cv2.GaussianBlur(img, (size, size), sigma)
def Gaussian(size, sigma):
    coef = 1 / (2 * np.pi * sigma **2)
    gaus = np.fromfunction(lambda x, y: coef * np.e \
                           ** ((-1*((x-(size-1)/2)**2+(y-(size-1)/2)**2))/(2*sigma**2)), (size, size))
    return gaus / np.sum(gaus)
# --8<-- [end:gauss]

smoothed = []
for size in [5, 10, 15]:
    row = []
    for sigma in [1, 3, 5]:
        title = f"fsize={size}, sigma={sigma}"
        temp = cv2.filter2D(img, -1, Gaussian(size, sigma))
        temp = cv2.putText(temp, title, (10, 10),cv2.FONT_HERSHEY_SIMPLEX, .5, (255,255,255), 1)
        row.append(temp)
    smoothed.append(np.hstack(row))
cv2.imwrite("../assets/filter_gaussian.jpg", np.vstack(smoothed))
# --8<-- [start:match]
def findMatch(crop, target):
    target = cv2.cvtColor(target, cv2.COLOR_BGR2GRAY).astype(float)
    crop = cv2.cvtColor(crop, cv2.COLOR_BGR2GRAY).astype(float)
    H_crop, W_crop = crop.shape[:2]
    ave_kernel = np.ones((H_crop, W_crop)) / (H_crop * W_crop)
    kNorm = (np.sum(crop * crop)) ** 0.5
    targetNorm = np.sqrt(cv2.filter2D(target * target, -1, ave_kernel) * (H_crop * W_crop))
    conv = cv2.filter2D(target, -1, crop / kNorm)
    return conv / targetNorm
# --8<-- [end:match]    
img_crop = img.copy()
cv2.rectangle(img_crop, (15, 35), (65, 85), 255)

crop = img[35: 85, 15: 65]
matched = findMatch(crop, img) * 255
cv2.rectangle(matched, (15, 35), (65, 85), 255)
cv2.imwrite(
    "../assets/filter_match.jpg", 
    np.hstack((img_crop, np.tile(matched[:, :, None], 3)))
)

# --8<-- [start:sep]   
kernel = np.array([[-1, 0, 1], 
                   [-2, 0, 2], 
                   [-1, 0, 1]])
kernel_h = np.array([[-1, 0, 1]])
kernel_v = np.array([[1, 2, 1]]).T
# note that images are in uint8
# and the 1D filters are not normalized
img_gray = img_gray.astype(float)
img_k = cv2.filter2D(img_gray, -1, kernel)
img_k1 = cv2.filter2D(img_gray, -1, kernel_h)
img_k2 = cv2.filter2D(img_k1, -1, kernel_v)

print(np.max(np.abs(img_k - img_k2)))
#>> 0.0
# --8<-- [end:sep]   
cv2.putText(img_k, f"Sobel_x 3*3", 
                    (20, 200),cv2.FONT_HERSHEY_SIMPLEX, .5, (255,255,255), 1)
cv2.putText(img_k1, f"h", 
                    (20, 200),cv2.FONT_HERSHEY_SIMPLEX, .5, (255,255,255), 1)
cv2.putText(img_k2, f"v * h", 
                    (20, 200),cv2.FONT_HERSHEY_SIMPLEX, .5, (255,255,255), 1)


cv2.imwrite("../assets/filter_sep.jpg", np.hstack((img_k, img_k1, img_k2)))