Pose Parameterization and Optimization
Note based on A tutorial on SE(3) transformation parameterizations and on-manifold optimization [Blanco-Claraco 2022]1 and SE(3) Geometry and Kinematics
In this note, unless specified, we are only interested in transformations in 3D.
Parametrize Rotations
The parameterization conventions are based on three.js
implementations.
A pose is just a rotation \(\mathbf R\) and a translation \(\mathbf t = (x, y, z)\), but the rotation can be parameterized in multiple ways with different pros and cons.
Euler Angles
The rotation is parameterized by 3 Euler angles \(\mathbf r = (r_x, r_y, r_z)\) and the rotation axis order (e.x. xyz
, roll-pitch-yaw
). Which means rotate around the first axis, and then the modified second axis, and finally the modified third axis.
However, euler angle degenerates, a.k.a. one possible rotation can be represented by two different set of Euler angles. In practice, optimizations directly on Euler angle representation is not a good idea due to the rotation order and the degeneration problem. However, euler angle representation is easy to interpret, hence used as user input.
Quaternions
The rotation is parameterized by normalized 4D vector \((q_x, q_y, q_z, q_r)\). A geometric interpretation is that of a rotation of \(\theta\) radians around the axis \(\mathbf v \propto (q_x, q_y, q_z)\), where the angle \(\theta\) is \(q_r = \cos(\frac{\theta}{2})\).
Note that quaternions expects to be normalized, hence it also has only 3 degree of freedom despite of being 4D.
Rotation Matrices
The speicial orthogonal group \(SO(3)\) is the set of all matrices \(\mathbf{R}\) that is orthogonal (\(\mathbf{R}^T\mathbf{R} = \mathbf{RR}^T = I\) or \(\mathbf{R}^{-1}=\mathbf{R}\), orthogonal also implies invertible) and \(\det(R)= 1\). Matrices in \(SO(3)\) are rotation matrices, which are isometric (the transformation preserves distance).
The special Euclidean group \(SE(3)\) is the set of all rigid transformation matrix in the form
which is the homogeneous transformation of \(\mathbf{R}\mathbf x + \mathbf t\).
Conversions
Euler Angles to Quaternions
Given Euler angle \((r_x, r_y, r_z)\) and the order XYZ
, the quaternion is
Quaternion.setFromEuler
setFromEuler(euler) {
const cos = Math.cos;
const sin = Math.sin;
const c1 = cos( x / 2 );
const c2 = cos( y / 2 );
const c3 = cos( z / 2 );
const s1 = sin( x / 2 );
const s2 = sin( y / 2 );
const s3 = sin( z / 2 );
switch ( order ) {
case 'XYZ':
this._x = s1 * c2 * c3 + c1 * s2 * s3;
this._y = c1 * s2 * c3 - s1 * c2 * s3;
this._z = c1 * c2 * s3 + s1 * s2 * c3;
this._w = c1 * c2 * c3 - s1 * s2 * s3;
break;
// Other order see original code
...
}
return this;
}
Euler Angles to Matrix
Since applying a rotation transformation is equivalent to right-side multiplication of a rotation matrix, as of the Euler angle rotation definition, it can be composed of 3 matrices.
Matrix4.makeRotationFromEuler
makeRotationFromEuler(euler) {
const te = this.elements;
const x = euler.x, y = euler.y, z = euler.z;
const a = Math.cos(x), b = Math.sin(x);
const c = Math.cos(y), d = Math.sin(y);
const e = Math.cos(z), f = Math.sin(z);
switch ( order ) {
case 'XYZ':
const ae = a * e, af = a * f, be = b * e, bf = b * f;
te[0] = c * e;
te[4] = - c * f;
te[8] = d;
te[1] = af + be * d;
te[5] = ae - bf * d;
te[9] = - b * c;
te[2] = bf - ae * d;
te[6] = be + af * d;
te[10] = a * c;
break;
// Other order see original code
...
}
return this;
}
Quaternion to Matrix
Matrix4.makeRotationFromQuaternion
makeRotationFromQuaternion(q) {
const te = this.elements;
const x = q.x, y = q.y, z = q.z, w = q.w;
const x2 = x + x, y2 = y + y, z2 = z + z;
// careful that xx = 2 * x * x instead of x * x
const xx = x * x2, xy = x * y2, xz = x * z2;
const yy = y * y2, yz = y * z2, zz = z * z2;
const wx = w * x2, wy = w * y2, wz = w * z2;
// q is normalized x*x+y*y+z*z+r*r = 1
te[ 0 ] = ( 1 - ( yy + zz ) );
te[ 1 ] = ( xy + wz );
te[ 2 ] = ( xz - wy );
te[ 4 ] = ( xy - wz );
te[ 5 ] = ( 1 - ( xx + zz ) );
te[ 6 ] = ( yz + wx );
te[ 8 ] = ( xz + wy );
te[ 9 ] = ( yz - wx );
te[ 10 ] = ( 1 - ( xx + yy ) );
}
Matrix to Quaternion
In general, the matrix can be converted into quaternion as
Notice that if \(tr(R) = R_{11} +R_{22} + R_{33} < -1\) then the formulas is invalid. Therefore, we need to identify which major diagonal element has the greatest value.
Therefore, it's divided into 4 cases.
- The trace is good. Then we use the formula above
- \(R_{11}\) is the largest, then
- \(R_{22}\) is the largest, then
- \(R_{33}\) is the largest, then
Quaternion.setFromRotationMatrix
setFromRotationMatrix( m: Matrix4 ) {
// Assume m is 4x4 transformation matrix and
// the upper 3x3 of m is a pure rotation matrix (i.e, unscaled)
const te = m.elements,
m11 = te[ 0 ], m12 = te[ 4 ], m13 = te[ 8 ],
m21 = te[ 1 ], m22 = te[ 5 ], m23 = te[ 9 ],
m31 = te[ 2 ], m32 = te[ 6 ], m33 = te[ 10 ],
trace = m11 + m22 + m33;
if ( trace > 0 ) {
const s = 0.5 / Math.sqrt( trace + 1.0 );
this._w = 0.25 / s;
this._x = ( m32 - m23 ) * s;
this._y = ( m13 - m31 ) * s;
this._z = ( m21 - m12 ) * s;
} else if ( m11 > m22 && m11 > m33 ) {
const s = 2.0 * Math.sqrt( 1.0 + m11 - m22 - m33 );
this._w = ( m32 - m23 ) / s;
this._x = 0.25 * s;
this._y = ( m12 + m21 ) / s;
this._z = ( m13 + m31 ) / s;
} else if ( m22 > m33 ) {
const s = 2.0 * Math.sqrt( 1.0 + m22 - m11 - m33 );
this._w = ( m13 - m31 ) / s;
this._x = ( m12 + m21 ) / s;
this._y = 0.25 * s;
this._z = ( m23 + m32 ) / s;
} else {
const s = 2.0 * Math.sqrt( 1.0 + m33 - m11 - m22 );
this._w = ( m21 - m12 ) / s;
this._x = ( m13 + m31 ) / s;
this._y = ( m23 + m32 ) / s;
this._z = 0.25 * s;
}
return this;
}
Operations on Rotations
For most operations, again we only consider quaternions and matrices.
Pose Composition
For compositing two transformation matrices, it's simply right-side multiplication \(T_2 T_1\).
For quaternions, applying a rotation on a 3D point can be effectively applied by first converting to a matrix, and then matrix multiplication. Therefore, the only useful operation is to composing two quaternions rotations.
For quaternions \(\mathbf{q}_1, \mathbf{q}_2\), its multiplication is defined as
Quaternion.multiplyQuaternions
multiplyQuaternions( a, b ) {
// from http://www.euclideanspace.com/maths/algebra/realNormedAlgebra/quaternions/code/index.htm
const qax = a._x, qay = a._y, qaz = a._z, qaw = a._w;
const qbx = b._x, qby = b._y, qbz = b._z, qbw = b._w;
this._x = qax * qbw + qaw * qbx + qay * qbz - qaz * qby;
this._y = qay * qbw + qaw * qby + qaz * qbx - qax * qbz;
this._z = qaz * qbw + qaw * qbz + qax * qby - qay * qbx;
this._w = qaw * qbw - qax * qbx - qay * qby - qaz * qbz;
return this;
}
Inverse
For transformation matrix, the inverse is simply the inverse of the matrix, and we know that rotation matrices are orthogonal, hence the inverse is the transpose.
For quaternions, thinks about the geometric interpretation. We can rotate the object along the opposite axis the same radians. Therefore, define the conjugate of the quaternion as \(\mathbf q^* = (-q_x, -q_y, -q_z, q_r)\), and the conjugate is the inverse rotation.
Pose Optimizations
The key problem for pose optimization is essentially find a smooth path s.t. the source camera pose \(P_{s}\) can be moved to the goal pose \(P_d\).
Geometrically speaking, the path is a sequence of infinitely small delta transformation \(T_0, T_1, T_2, ... \in SE(3)\), where each transformation \(P_i = T_i T_{i-1}...T_0P_s\) moves closer to and eventually become \(P_d\).
Lie Groups
Let \(G\) be a set of elements and consider a binary operation (often understood as multiplication). For any two elements \(A,B\in G\), denote the result of the multiplication as \(AB\). Then, \(G\) plus the operation is a group if the following properties are satisfied
- Closure \(\forall A, B\in G. AB\in G\)
- Associativity \(\forall A,B,C\in G. (AB)C=A(BC)\)
- Identity \(\exists I\in G. \forall A\in G. IA=AI=A\)
- Inverse \(\forall A\in G. \exists A^{-1}\in G. AA^{-1} = A^{-1} A = I\)
A manifold \(M\) is a topological space s.t. \(\forall p\in M\), its neighborhood is a homeomorphism. Intuitively, it means that for any point on the manifold, there's an infinitely small neighborhood that is "flat".
Consider a paper, the paper is a 2D manifold embedded in 3D space. However, if we fold the paper to form a "sharp" edge, it's no longer a manifold.
Denote the tangent space of manifold \(M\) at point \(p\) as \(T_pM\) can be intuitively defined as the vector space of the derivatives at \(p\) of all possible smooth curves on \(M\) that pass through \(p\).
Finally, a Lie group \(G\subset \mathbb R^n\) is - a group - a manifold - the multiplication operation and inverse operation are both smooth.
SE(3) and SO(3) as Lie Groups
For our purpose, we notice that SE(3) is a 6D manifold (3 degree of freedom for rotation and 3 for translation) since
- The product of 2 SE(3) matrices is still in SE(3).
- The products of SE(3), a.k.a. matrix multiplications, are associative
- Exists the identity matrix \(I:=[\mathbf I_3 \mid \mathbf 0_3]\)
- Exists inverse transformation \([\mathbf R\mid \mathbf t]^{-1} = [\mathbf R^T\mid -\mathbf R^T \mathbf t]\)
- The product and the inverse are continuous w.r.t. each of the element.
Similarly, \(SO(3)\) is also a 3D manifold and another Lie group.
Intuitively, we confirm that the pose optimization problem is an optimization over a smooth manifold.
Lie Algebra
A Lie algebra is a vector space \(V\) over some field \(F\) with a binary operation \([\cdot, \cdot]\) (called a Lie bracket) associated with every Lie group. The Lie bracket satisfies that \(\forall X, Y, Z \in V, \forall a, b\in F\)
- Closure \([X, Y] \in V\)
- Bi-linearity \([aX+bY, z] = a[X,Z] + b[Y,Z], [Z,aX+bY] = a[Z,X]+b[Z,Y]\)
- Alternating \([X, X] = 0\)
- Jacobi identity \([X,[Y,Z]] + [Y,[Z,X]] + [Z,[X,Y]] = 0\)
An exponential map maps elements from the Lie algebra to the Lie group (manifold) and determines the local structure of the manifold, and a logarithm map maps elements from the manifold to the algebra.
Lie Algebra of SO(3)
Note that for rotations, local changes can be well-represented by the Euler angles. Therefore, let the Lie algebra of SO(3) be the space of
the exponential map is
Rodrigues Formula gives the closed-form expression
Therefore, any local linear change to \(\mathbf r\) can be mapped to rotation matrices \(\mathbf R\). Practically, we can parameterize the delta transformation as 6D vector \((\mathbf r, \mathbf t)\) and optimize them separately.
exp_map_SO3
def skew_sym_mat(r):
""" take a 3D vector, return its skew symmetric matrix
"""
r_x, r_y, r_z = r
return torch.tensor([
[ 0, -r_z, r_y],
[ r_z, 0, -r_x],
[-r_y, r_x, 0]
])
def exp_map_SO3(r):
""" take a so(3) tangent vector r, return SO(3) rotation matrix R
"""
r_skew = skew_sym_mat(r)
r_norm = torch.norm(r)
r_norm_inv = 1.0 / r_norm
R = torch.eye(3)
R += r_norm_inv * r_norm.sin() * r_skew
R += r_norm_inv * r_norm_inv * (1.0 - r_norm.cos()) * (r_skew @ r_skew)
return R
Lie Algebra of SE(3)
Alternatively, we jointly optimize 6 parameters over the 6D manifold.
The exponential map and Rodrigues formula is given as
Note that the exponential map is consisted of the rotational exponential map and a translation
exp_map_SE3
def left_jacobian(r):
""" take a so(3) tangent vector r, return
the left jacobian matrix
"""
J = torch.eye(3)
r_norm = torch.norm(r)
r_norm_inv = 1.0 / r_norm
r_skew = skew_sym_mat(r)
J += r_norm_inv ** 2 * (1.0 - r_norm.cos()) * r_skew
J += r_norm_inv ** 3 * (r_norm - r_norm.sin()) * (r_skew @ r_skew)
return J
def exp_map_SE3(xi):
""" take a se(3) tangent vector xi = (r, t), return SE(3) transformation matrix T
"""
r = xi[:3]
t = xi[3:]
T = torch.eye(4)
T[:3, :3] = exp_map_SO3(r)
J = left_jacobian(r)
T[:3, 3] = J @ t
return T
-
Blanco-Claraco, J.L. 2022. A tutorial on SE(3) transformation parameterizations and on-manifold optimization. https://arxiv.org/abs/2103.15980. ↩