Scale Invariant Feature Transform
import cv2
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
def imshow(img, ax=None, title="", bgr=True):
# since plt and cv2 have different RGB sorting
if bgr:
img = cv2.cvtColor(img,cv2.COLOR_BGR2RGB)
if ax == None:
plt.imshow(img.astype(np.uint8))
plt.axis("off")
plt.title(title)
else:
ax.imshow(img.astype(np.uint8), cmap="gray")
ax.set_axis_off()
ax.set_title(title)
plt.rcParams["figure.figsize"] = (12,6)
Distinctive Image Features from Scale-Invariant Keypoints [David G. Lowe, 2004]
LoG and DoG
Laplacian of Gaussian
Let \(G\) be the Gaussian function, \(G = (2\pi\sigma^2)^{-1} \exp(-\frac{x^2+y^2}{2\sigma^2})\)
Difference of Gaussian
By the limit definition of derivative,
So that
Scale-space extrema (Octave)
To identify locations and scales that can be repeatably assigned under differing views of the same object.
Constructing octave
Generate \(m\) octaves (or image pyramids), each with \(n\) different blurring scale (\(k^1\sigma, ..., k^n\sigma\)). Generally, we choose \(k = 2^{1/s}, s\in\mathbb Z\), and each octave contains \(s+3\) scales, i.e. \(2^{1/s}, 2^{2/s},...2^{1+3/s}\). Then, for the next level of the octave, down-sample from the one with \(2\sigma\), then keep do the next.
Example If \(s = 2\), i.e. \(k = \sqrt{2}\)
scale1 | scale2 | scale3 | scale4 | scale5 | |
---|---|---|---|---|---|
octave1 | \(\sqrt{2}\) | \(2\) | \(2\sqrt{2}\) | \(4\) | \(4\sqrt{2}\) |
octave2 | \(2\sqrt{2}\) | \(4\) | \(4\sqrt{2}\) | \(8\) | \(8\sqrt{2}\) |
octave3 | \(4\sqrt{2}\) | \(8\) | \(8\sqrt{2}\) | \(16\) | \(16\sqrt{2}\) |
octave4 | \(8\sqrt{2}\) | \(16\) | \(16\sqrt{2}\) | \(32\) | \(32\sqrt{2}\) |
Note that \(G(0, \sigma_1)* G(0, \sigma_2) = G(0, \sqrt{\sigma_{1}^2 + \sigma_{2}^2})\) so that within one octave, we continously convolve the image with scale \(k\). Also, downsampling by \(s\) will divide the current scale by \(s\).
def construct_octaves(image, n_octaves: int, s: int, sigma: float):
"""
image: the original image
n_octaves: number of octaves
s: down sampling rate
sigma: scale
"""
octaves = []
for _ in range(n_octaves):
octave = [image.copy()]
for _ in range(s - 1):
image = cv2.GaussianBlur(image, (5, 5), sigma)
octave.append(image.copy())
image = image[::2,::2]
octaves.append(octave)
return octaves
Taking DoG
Then, for each octave, take DoG by taking the difference between adjacent scales. Note that \(k\) is unchanged for each DoG. Then, decide the matching frequency and record the matching.
Find Local Extrema
Find the local extrema by searching each scale in each octave, except the first and last scale, i.e. \(1,...,s+1\). "Localization" means the 8 pixels around it, 9 pixels above it, and 9 pixels below it.
def find_local_extrema(above, self, below):
""" Find the local extrema contained in self"""
key_points = []
rows, cols = self.shape
for r in range(1, rows-1):
for c in range(1, cols-1):
window = np.empty((3, 3, 3))
window[:, :, 0] = above[r-1: r+2, c-1: c+2]
window[:, :, 1] = self[r-1: r+2, c-1: c+2]
window[:, :, 2] = below[r-1: r+2, c-1: c+2]
if self[r, c] == np.max(window) or self[r, c] == np.min(window):
key_points.append((r, c))
return key_points
Key Points Localization (orientation invariant)
For each chosen local extrema point, apply Harris corner detector with appropriate window size to remove all the edges and flats.