Geometric Models

Coordinate Systems

Normalized Image Coordinates

The 2d position of a point in images is stored in what we will call normalized image coordinates. The origin is in the middle of the image. The x coordinate grows to the right and y grows downwards. The larger dimension of the image is 1.

This means, for example, that all the pixels in an image with aspect ratio 4:3 will be contained in the intervals [-0.5, 0.5] and [3/4 * (-0.5), 3/4 * 0.5] for the X and Y axis respectively.

+-----------------------------+
|                             |
|                             |
|                             |
|              + ------------->
|              | (0, 0)       | (0.5, 0)
|              |              |
|              |              |
+-----------------------------+
               |
               v
                (0, 0.5)

Normalized coordinates are independent of the resolution of the image and give better numerical stability for some multi-view geometry algorithms than pixel coordinates.

Pixel Coordinates

Many OpenCV functions that work with images use pixel coordinates. In that reference frame, the origin is at the center of the top-left pixel, x grow by one for every pixel to the right and y grows by one for every pixel downwards. The bottom-right pixel is therefore at (width - 1, height - 1).

The transformation from normalised image coordinates to pixel coordinates is

\[\begin{split}H = \begin{pmatrix} \max(w, h) & 0 & \frac{w-1}{2} \\ 0 & \max(w, h) & \frac{h-1}{2} \\ 0 & 0 & 1 \end{pmatrix},\end{split}\]

and its inverse

\[\begin{split}H^{-1} = \begin{pmatrix} 1 & 0 & -\frac{w-1}{2} \\ 0 & 1 & -\frac{h-1}{2} \\ 0 & 0 & \max(w, h) \end{pmatrix},\end{split}\]

where \(w\) and \(h\) being the width and height of the image.

World Coordinates

The position of the reconstructed 3D points is stored in world coordinates. In general, this is an arbitrary euclidean reference frame.

When GPS data is available, a topocentric reference frame is used for the world coordinates reference. This is a reference frame that with the origin somewhere near the ground, the X axis pointing to the east, the Y axis pointing to the north and the Z axis pointing to the zenith. The latitude, longitude, and altitude of the origin are stored in the reference_lla.json file.

When GPS data is not available, the reconstruction process makes its best to rotate the world reference frame so that the vertical direction is Z and the ground is near the z = 0 plane. It does so by assuming that the images are taken from similar altitudes and that the up vector of the images corresponds to the up vector of the world.

Camera Coordinates

The camera coordinate reference frame has the origin at the camera’s optical center, the X axis is pointing to the right of the camera the Y axis is pointing down and the Z axis is pointing to the front. A point in front of the camera has positive Z camera coordinate.

The pose of a camera is determined by the rotation and translation that converts world coordinates to camera coordinates.

Camera Models

The camera models deal with the projection of 3D points expressed in camera coordinates x, y, z into points u, v in normalized image coordinates.

Perspective Camera

\[\begin{split}\begin{array}{l} x_n = \frac{x}{z} \\ y_n = \frac{y}{z} \\ r^2 = x_n^2 + y_n^2 \\ d = 1 + k_1 r^2 + k_2 r^4 \\ u = f\ d\ x_n \\ v = f\ d\ y_n \end{array}\end{split}\]

Fisheye Camera

\[\begin{split}\begin{array}{l} r^2 = x^2 + y^2 \\ \theta = \arctan(r / z) \\ d = 1 + k_1 \theta^2+ k_2 \theta^4 \\ u = f\ d\ \theta\ \frac{x}{r} \\ v = f\ d\ \theta\ \frac{y}{r} \end{array}\end{split}\]

Spherical Camera

\[\begin{split}\begin{array}{l} \mathrm{lon} = \arctan\left(\frac{x}{z}\right) \\ \mathrm{lat} = \arctan\left(\frac{-y}{\sqrt{x^2 + z^2}}\right) \\ u = \frac{\mathrm{lon}}{2 \pi} \\ v = -\frac{\mathrm{lat}}{2 \pi} \end{array}\end{split}\]