GraphMath

Least squares, best solution and projection

Best-fit models as projections onto model subspaces

Why does adding more parameters reduce the residual in least squares?

Least squares fits y by projecting it onto col(X). Adding parameters expands col(X), so the projection has more directions available and can only move closer to y. The residual y − ŷ is the orthogonal component, so its length can stay the same or decrease, but it cannot increase.

Key ideas

Least squares gives the best solution of an inconsistent system by projecting b onto col(A). The fitted vector is b∥ and the residual is b⊥.

  • The model is the choice of a subspace and regression is projection onto that subspace
  • Increasing model complexity expands col(A) and can only reduce the residual
  • The residual is the component of b orthogonal to col(A)
  • When col(A) = ℝᵐ, the projection becomes identity and the residual is zero
  • An exact fit is not always a better model because it may reflect overfitting

The chapter builds these ideas through concrete examples of fitting a line, a parabola, a plane and a paraboloid, then compares models geometrically by watching the model subspace change.

What does least squares compute geometrically?

Least squares computes the orthogonal projection of y onto col(X). The fitted vector ŷ lies in col(X), and the residual y − ŷ is perpendicular to col(X). This makes ŷ the closest point in the model subspace to the data.

Related chapters

Chapter contents

The PDF is a single document. The page links below are best-effort: most browsers support them, but some viewers may ignore the page hint.

Topic Pages
OLS example 1: fitting a line 1–4
OLS example 2: fitting a parabola with 2 parameters 5–9
Two models for same dataset 9–12
Higher-dimensional set 12–14
Summary: models with one input variable 14–16
Model with 2 input variables 17–18
Non-linear model with 2 input variables 18–21
Summary: models with > 1 input variable 22–24
Two ways to change a regression model 25–32
Overfitted model 33–34

Why can a model fit the data exactly and still be a bad model?

A zero residual only means that y already lies in the chosen model subspace. If that subspace becomes too large, the model may start fitting accidental variation in the sample rather than the underlying pattern. In that case, exact fit reflects overfitting, not necessarily a better model.

Was this chapter helpful?

Quick feedback helps us improve the site.