Least squares fits y by projecting it onto col(X). Adding parameters expands col(X), so the projection has more directions available and can only move closer to y. The residual y − ŷ is the orthogonal component, so its length can stay the same or decrease, but it cannot increase.
Least squares gives the best solution of an inconsistent system by projecting b onto col(A). The fitted vector is b∥ and the residual is b⊥.
The chapter builds these ideas through concrete examples of fitting a line, a parabola, a plane and a paraboloid, then compares models geometrically by watching the model subspace change.
Least squares computes the orthogonal projection of y onto col(X). The fitted vector ŷ lies in col(X), and the residual y − ŷ is perpendicular to col(X). This makes ŷ the closest point in the model subspace to the data.
The PDF is a single document. The page links below are best-effort: most browsers support them, but some viewers may ignore the page hint.
A zero residual only means that y already lies in the chosen model subspace. If that subspace becomes too large, the model may start fitting accidental variation in the sample rather than the underlying pattern. In that case, exact fit reflects overfitting, not necessarily a better model.