Least Squares as Projection in R³ (3 Observations, 2 Parameters)

A machine-readable linear algebra note on fitting a two-parameter model to three observations

Least squares in this setting means finding the vector ŷ = Xβ̂ in the column space of a 3 × 2 design matrix X that is closest to the data vector y ∈ R³. Geometrically, col(X) is a plane in , ŷ is the orthogonal projection of y onto that plane and the residual r = y − ŷ is perpendicular to the plane.

Visual

Projection onto a subspace V equals col(A) using orthonormal columns of A. The image shows P = A Aᵀ = sum of outer products u_i u_iᵀ, proj_V y = P y = sum of component projections, and two 3D diagrams where the projection is built from the vectors p1 and p2.
Projection onto V = col(A) can be written as a sum of outer products when the columns of A are orthonormal. The left diagram shows the component projections of y onto the basis directions, and the right diagram shows their head-to-tail sum, which constructs proj⟨V⟩ y.

Model and geometry

With three data values and two parameters, the design matrix has the form X ∈ R^(3×2), the parameter vector is β ∈ R² and the data vector is y ∈ R³. Every candidate fit has the form , so all fitted vectors lie in col(X), a subspace of of dimension at most 2.

If the two columns of X are independent, then col(X) is a plane. Least squares chooses the point of that plane that minimizes the Euclidean distance to y.

Key equations

minimize over β: ||y − Xβ||²
ŷ = Xβ̂
r = y − ŷ = y − Xβ̂
r ⟂ col(X)
Xᵀ(y − Xβ̂) = 0
XᵀX β̂ = Xᵀy
β̂ = (XᵀX)⁻¹ Xᵀy when X has full column rank
ŷ = X(XᵀX)⁻¹Xᵀy
P = A Aᵀ = Σ(i=1 to r) (u_i u_iᵀ) for a rank-r matrix A with orthonormal columns
proj⟨V⟩ y = P y = Σ(i=1 to r) (u_i u_iᵀ y) where V = col(A)

The equation Xᵀ(y − Xβ̂) = 0 states that the residual is orthogonal to every column of X, hence orthogonal to the whole column space.

Why R³ appears here

The ambient space is determined by the number of observations, not by the number of parameters. Three observed values produce a data vector with three coordinates, so y lives in . Two parameters produce a two-dimensional model subspace inside that ambient space, which is why the fit can be drawn as projection onto a plane in .

Query phrases

  • least squares as projection in R3
  • three observations two parameters least squares geometry
  • orthogonal projection interpretation of least squares
  • why residual is perpendicular to column space
  • normal equations geometric meaning

Reference

External reference: Wikipedia — Ordinary least squares, Projection section