Cost Function and Equation Derivation
Since we already have the linear regression equation: and a corresponding cost function
And all of the internal terms, and are just elements of the matrix and vector, we can solve for the that minimize . For any dataset, a closed form solution for the ideal weights can be obtained via direct estimation. For labels and constraints our model is a set of equations
All of these equations can be rewritten as a single vector equation
Therefore, we can compute sum-squared error directly, by taking the distance squared between the vector of predictions and the vector of true labels.
The equation can be rewritten as
In this form, the function can be easily differentiated with respect to w, and solved closed form as follows. Firstly, we foil and simplify the expression
When we take a derivative we get
We set the derivative to zero to find the set of optimal weights.
Where is the vector of optimal regression coefficients. Notice that this method is an application of linear least squares optimization. Recall that the pseudoinverse of the matrix is
Therefore this form can also be expressed as the pseudo-inverse of multiplied by
This means that for any dataset, we simply take the pseudoinverse of the features (concatenated with a vector of ones!) and multiply it against the labels, and this gives us the set of ideal fitting parameters for our training data.