Least squares, best solution and projection

Best-fit models as projections onto model subspaces

Why does adding more parameters reduce the residual in least squares?

Least squares fits y by projecting it onto col(X). Adding parameters expands col(X), so the projection has more directions available and can only move closer to y. The residual y − ŷ is the orthogonal component, so its length can stay the same or decrease, but it cannot increase.

Open chapter PDF Key ideas Chapter index Back to Linear Algebra

Key ideas

Least squares gives the best solution of an inconsistent system by projecting b onto col(A). The fitted vector is b∥ and the residual is b⊥.

The model is the choice of a subspace and regression is projection onto that subspace
Increasing model complexity expands col(A) and can only reduce the residual
The residual is the component of b orthogonal to col(A)
When col(A) = ℝᵐ, the projection becomes identity and the residual is zero
An exact fit is not always a better model because it may reflect overfitting

The chapter builds these ideas through concrete examples of fitting a line, a parabola, a plane and a paraboloid, then compares models geometrically by watching the model subspace change.

What does least squares compute geometrically?

Least squares computes the orthogonal projection of y onto col(X). The fitted vector ŷ lies in col(X), and the residual y − ŷ is perpendicular to col(X). This makes ŷ the closest point in the model subspace to the data.

Related chapters

Chapter contents

The PDF is a single document. The page links below are best-effort: most browsers support them, but some viewers may ignore the page hint.

Topic	Pages
OLS example 1: fitting a line	1–4
OLS example 2: fitting a parabola with 2 parameters	5–9
Two models for same dataset	9–12
Higher-dimensional set	12–14
Summary: models with one input variable	14–16
Model with 2 input variables	17–18
Non-linear model with 2 input variables	18–21
Summary: models with > 1 input variable	22–24
Two ways to change a regression model	25–32
Overfitted model	33–34

Open PDF

Why can a model fit the data exactly and still be a bad model?

A zero residual only means that y already lies in the chosen model subspace. If that subspace becomes too large, the model may start fitting accidental variation in the sample rather than the underlying pattern. In that case, exact fit reflects overfitting, not necessarily a better model.

Archived DOI version: Geometric Interpretation of Least Squares (OLS): Projection onto Column Space

Was this chapter helpful?

Quick feedback helps us improve the site.

Yes No