In computer vision, triangulation is the process of recovering a 3D point from its projections in two or more images.

If the same scene point is observed by multiple cameras, each image observation defines a ray in 3D space. Triangulation estimates the 3D point where those rays meet.

Intuition

Suppose a 3D point appears as:

  • in the first image
  • in the second image

Using the camera models for the two views, each image point can be back-projected into a ray starting from the camera center.

The 3D point is obtained by finding the point that best agrees with those rays.

Why It Matters

Triangulation is one of the core steps in 3D reconstruction:

  • feature matching tells us which image points correspond
  • the fundamental matrix or essential matrix constrains valid matches
  • triangulation turns those matches into 3D points

Without triangulation, we only know relationships between images. With triangulation, we begin to recover the shape of the scene.

Ideal Case

In the noise-free case, the back-projected rays from the two cameras intersect exactly at the true 3D point.

In practice, image measurements are noisy, so the rays usually do not intersect perfectly. Triangulation then finds the 3D point that minimizes error.

Mathematical Setup

Let the two camera projection matrices be

and let the homogeneous image points be

We want to find the homogeneous 3D point such that

and

where means equality up to scale.

Geometric Meaning

Triangulation needs:

  • camera geometry
  • corresponding image points

The better the camera poses and correspondences, the better the reconstructed 3D points.

Triangulation works best when there is enough baseline, meaning the cameras are separated enough to provide useful geometric information.

Limitation

Triangulation becomes unstable when:

  • the two viewing rays are nearly parallel
  • the baseline between cameras is very small
  • point matches are noisy or incorrect

This is why accurate feature matching and camera estimation are important before triangulation.

In Reconstruction Pipelines

A common pipeline is:

  1. detect and match features
  2. estimate F or E
  3. recover relative camera pose
  4. triangulate matched points
  5. refine the reconstruction with optimization

Triangulation is therefore the step that converts image correspondences into an actual sparse 3D point cloud.