Polynomial Regression
We can generalize the ideas that we developed previously to create a model that maps the features onto non-linear surfaces. For that purposes we add additional terms into our model equation. Let’s assume that number of features is still , however, we can also add transformed versions of into the model. Transformed versions are taken to different integer powers. Our prediction for the ith feature constricted using an d-degree polynomial is The whole system of n equations can be rewritten in the matrix notation as
Notice that essentially the problem did not change from the linear case and the expression for the predicted values is still Therefore similarly to before we define a loss function And through the means of linear least squares optimization, we arrive to the vector of optimal regression coefficients Or equivalently In this way, a polynomial regression is algorithmically extremely similar to a linear regression. The key difference is not in the solution, but in the formulation to the problem. If we define our X matrix to include higher order values of some features, then we can treat the problem as a linear regression problem over some variables that encode for higher order features. The solution and methods of approximation are completely identical beyond this simple extension.