The Trilinearity Equation For Trifocal Tensor Estimation (Direct Linear Estimation)

The trilinearity equation describes the geometric relationship between corresponding points in three views using the trifocal tensor. It encapsulates how points in the first two views can be used to predict the corresponding point in the third view.

The Trilinearity Equation

For three corresponding points $\mathbf{x}_1$ , $\mathbf{x}_2$ , and $\mathbf{x}_3$ in three images:

[\mathbf{x}_2]_\times \left( \sum_{i=1}^3 x_{1i} T_i \right) [\mathbf{x}_3]_\times = 0

Where:

$[\mathbf{x}_2]_\times$ : Skew-symmetric matrix derived from the coordinates of $\mathbf{x}_2$ (view 2).
$[\mathbf{x}_3]_\times$ : Skew-symmetric matrix derived from the coordinates of $\mathbf{x}_3$ (view 3).
$\sum_{i=1}^3 x_{1i} T_i$ : A linear combination of the slices $T_1, T_2, T_3$ of the trifocal tensor $T$ , weighted by the coordinates $x_{11}, x_{12}, x_{13}$ of the point $\mathbf{x}_1$ in the first view.

Key Components

Skew-Symmetric Matrix: The skew-symmetric matrix $[\mathbf{x}]_\times$ for a point $\mathbf{x} = [x, y, w]^\top$ is:
$[\mathbf{x}]_\times = \begin{bmatrix} 0 & -w & y \\ w & 0 & -x \\ -y & x & 0 \end{bmatrix}$
This matrix represents the cross product $\mathbf{x} \times \mathbf{v}$ , allowing us to express epipolar constraints in matrix form.
Slices of Trifocal Tensor: $T_i$ ( $i = 1, 2, 3$ ) are the $3 \times 3$ slices of the $3 \times 3 \times 3$ trifocal tensor $T$ . These slices encode the geometric relationships between the three views.
Equation Interpretation:
- $\sum_{i=1}^3 x_{1i} T_i$ : Constructs a matrix representing the relationship between $\mathbf{x}_2$ and $\mathbf{x}_3$ , parameterized by $\mathbf{x}_1$ .
- $[\mathbf{x}_2]_\times$ : Ensures that the projected point $\mathbf{x}_3$ lies on the epipolar line induced by $\mathbf{x}_1$ in the second image.
- $[\mathbf{x}_3]_\times$ : Ensures that the projected point $\mathbf{x}_2$ lies on the epipolar line induced by $\mathbf{x}_1$ in the third image.

Geometric Meaning:

The equation ensures that the image correspondences satisfy the epipolar geometry induced by three views. In simpler terms, it guarantees that the projections of the same 3D point into the three views maintain geometric consistency.

Derivation of the Trilinearity Equation

3D Geometry and Projection: Let a 3D point $\mathbf{X}$ project to three points $\mathbf{x}_1$ , $\mathbf{x}_2$ , and $\mathbf{x}_3$ in the three images with camera matrices $P_1, P_2, P_3$ :
$\mathbf{x}_1 = P_1 \mathbf{X}, \quad \mathbf{x}_2 = P_2 \mathbf{X}, \quad \mathbf{x}_3 = P_3 \mathbf{X}.$
Epipolar Constraints:
- In two-view geometry, the epipolar constraint relates $\mathbf{x}_1$ and $\mathbf{x}_2$ using the fundamental matrix $F$ : $\mathbf{x}_2^\top F \mathbf{x}_1 = 0$ .
- For three views, the trifocal tensor generalizes this constraint to account for correspondences across three images.
Trifocal Tensor: The trifocal tensor $T$ encodes the relationship between three views and allows us to write the trilinear relationship:
$\mathbf{x}_2^\top \left( \sum_{i=1}^3 x_{1i} T_i \right) \mathbf{x}_3 = 0.$
Expanding this into matrix form using $[\mathbf{x}_2]_\times$ and $[\mathbf{x}_3]_\times$ ensures that both left and right epipolar constraints are satisfied.

Simplified Form for Linear Estimation

The trilinearity equation expands into a system of 27 linear equations (one for each element of $T$ ), which can be solved using Singular Value Decomposition (SVD) for direct estimation of the trifocal tensor.

######## In a more simplified way we can write ########

A=np.array([p1[0]*p2[0]*p3[0],p1[0]*p2[0]*p3[1],p1[0]*p2[0],p1[0]*p2[1]*p3[0],p1[0]*p2[1]*p3[1],p1[0]*p2[1],p1[0]*p3[0],p1[0]*p3[1],p1[0],p1[1]*p2[0]*p3[0],p1[1]*p2[0]*p3[1],p1[1]*p2[0],p1[1]*p2[1]*p3[0],p1[1]*p2[1]*p3[1],p1[1]*p2[1],p1[1]*p3[0],p1[1]*p3[1],p1[1],p2[0]*p3[0],p2[0]*p3[1],p2[0],p2[1]*p3[0],p2[1]*p3[1],p2[1],p3[0],p3[1],1])

T=np.array([T100,T101,T102,T110,T111,T112,T120,T121,T122,T200,T201,T202,T210,T211,T212,T220,T221,T222,T300,T301,T302,T310,T311,T312,T320,T321,T322])

such that A*T=0,

we can solve it by SVD of A, to find T. where p1,p2,p3 are point correspondences.

# Let me write the full python code to implement it

#%%

import numpy as np

p11=np.load('p1.npy')

p22=np.load('p2.npy')

p33=np.load('p3.npy')

def linearTrifocalTensor(p1,p2,p3):

A=[]

for p1,p2,p3 in zip(p11,p22,p33):

A.append(np.array([p1[0]*p2[0]*p3[0],p1[0]*p2[0]*p3[1],p1[0]*p2[0],p1[0]*p2[1]*p3[0],

p1[0]*p2[1]*p3[1],p1[0]*p2[1],p1[0]*p3[0],p1[0]*p3[1],p1[0],

p1[1]*p2[0]*p3[0],p1[1]*p2[0]*p3[1],p1[1]*p2[0],p1[1]*p2[1]*p3[0],p1[1]*p2[1]*p3[1],

p1[1]*p2[1],p1[1]*p3[0],p1[1]*p3[1],p1[1],p2[0]*p3[0],

p2[0]*p3[1],p2[0],p2[1]*p3[0],p2[1]*p3[1],p2[1],

p3[0],p3[1],1]))

_, _, V = np.linalg.svd(np.array(A))

t = V[-1, :]

T1=t[:9].reshape((3,3));T2=t[9:18].reshape((3,3));T3=t[18:].reshape((3,3));

T=np.dstack([T1,T2,T3])

return T

T=linearTrifocalTensor(p11, p22, p33)

#%%

def reprojection_error(T, p1, p2, p3):

"""

Compute the reprojection error for a trifocal tensor.

"""

error = []

for i in range(p1.shape[0]):

x1, y1 = p1[i]

x2, y2 = p2[i]

x3, y3 = p3[i]

for j in range(3):

eq = np.array([x2, y2, 1]) @ T[:, :, j] @ np.array([x3, y3, 1])

error.append(eq * np.array([x1, y1, 1])[j])

return np.array(error).flatten()

print("validation error rms: ", np.sqrt(np.mean(reprojection_error(T, p11, p22, p33)**2)))

*Very Important Note*

The estimated trifocal tensor above (Direct Linear Estimation), is not a valid trifocal tensor but just the initial estimate of it. To read more about valid trifocal tensor read the article "Valid Trifocal Tensor and its Estimation".

Attachments for Experiment:

Images, I1,I2,I3.