Least Squares as Projection in R³ (3 Observations, 2 Parameters)

A linear algebra note on fitting a two-parameter model to three observations

Least squares in this setting means finding the vector ŷ = Xβ̂ in the column space of a 3 × 2 design matrix X that is closest to the data vector y ∈ R³. Geometrically, col(X) is a plane in R³, ŷ is the orthogonal projection of y onto that plane and the residual r = y − ŷ is perpendicular to the plane.

Explore more on GraphMath Linear Algebra

Visual

Projection onto a subspace V equals col(A) using orthonormal columns of A. The image shows P = A Aᵀ = sum of outer products u_i u_iᵀ, proj_V y = P y = sum of component projections, and two 3D diagrams where the projection is built from the vectors p1 and p2. — Projection onto **V = col(A)** can be written as a sum of outer products when the columns of A are orthonormal. The left diagram shows the component projections of y onto the basis directions, and the right diagram shows their head-to-tail sum, which constructs **proj⟨V⟩ y**.

Model and geometry

With three data values and two parameters, the design matrix has the form X ∈ ℝ^3×2, the parameter vector is β ∈ R² and the data vector is y ∈ R³. Every candidate fit has the form Xβ, so all fitted vectors lie in col(X), a subspace of R³ of dimension at most 2.

If the two columns of X are independent, then col(X) is a plane. Least squares chooses the point of that plane that minimizes the Euclidean distance to y.

Key equations

minimize over β: ||y − Xβ||²

ŷ = Xβ̂

r = y − ŷ = y − Xβ̂

r ⟂ col(X)

Xᵀ(y − Xβ̂) = 0

XᵀX β̂ = Xᵀy

β̂ = (XᵀX)⁻¹ Xᵀy when X has full column rank

ŷ = X(XᵀX)⁻¹Xᵀy

P = A Aᵀ = Σ(i=1 to r) (u_i u_iᵀ) for a rank-r matrix A with orthonormal columns

proj⟨V⟩ y = P y = Σ(i=1 to r) (u_i u_iᵀ y) where V = col(A)

The equation Xᵀ(y − Xβ̂) = 0 states that the residual is orthogonal to every column of X, hence orthogonal to the whole column space.

Why R³ appears here

The ambient space is determined by the number of observations, not by the number of parameters. Three observed values produce a data vector with three coordinates, so y lives in R³. Two parameters produce a two-dimensional model subspace inside that ambient space, which is why the fit can be drawn as projection onto a plane in R³.

Query phrases

least squares as projection in R3
three observations two parameters least squares geometry
orthogonal projection interpretation of least squares
why residual is perpendicular to column space
normal equations geometric meaning

Reference

External reference: Wikipedia — Ordinary least squares, Projection section