In the series “The Mathematics Behind” I will explain mathematical concepts behind commonly used technologies. In this post, I will explain the mathematics behind polynomial curve fitting.
First of all, what is polynomial curve fitting and where is it used for? Suppose we are trading on a stock market. The stock price is going up and down (see the figure) and we want to discover patterns in the price chances if any exists. Polynomial curve fitting tries to fit a model (here: a polynomial) on the given datapoints as good as possible.
I will tell you a little secret. The “hidden” curve was generated by using the following function: and a little noise was added, but our algorithm will not know this secret. The goal of the algorithm is to find this hidden structure. And that is what curve fitting is all about.
The model we will use to approximate this hidden function is a polynomial of degree 3. That means that the polynomial is of the form . Notice that we have 4 degrees of freedom here since we can change the 4 weights in the model .
Notice that the polynomial can be expressed compactly: . Our goal is to make this function as close as possible to the true datapoints. Suppose we have datapoints where the datapoint is a pair . There are many ways to define an error metric. One possibility is the Mean Squared Error (MSE) which is defined as follows: .
In order to make the distance (the MSE) between the true datapoints and the polynomial as close as possible, we have to differentiate . Note that . Now we can define some quantities, and we will do. These quantities seems weird at the first sight, but they will eventually solve the problem. Define: and .
Now we have all the mathematical tools, we can solve this problem! First note that in order to minimize the MSE, we must have that all partial derivatives are zero: . From this, we derive: and this is the same as and here is where our weird quantities are needed, since this is equivalent to and this is equivalent to . In matrix form, this is equivalent to and this can be solved easily by left-muliplying by : .
By implementing these quantities in MatLab, the following weights can be found: . In other words, the “hidden” structure is: , which is shown in the following image (and the result is surprisingly close to the function which we want to approximate!).
How can we bring the theory into practice? First, we need to calculate the weights, and this can be done by using the following piece of MatLab code.
function w = calculate_weights(M, x, t) % Calculate the weights for a polynomial of degree M, given % x: a vector containing all the x values and t: a vector % containing all target values such that for all n, 1 <= n <= N % it holds that (x(n), t(n)) is a datapoint. N = size(x, 2); % Calculate T T = zeros(M, 1); for m = 1:M T(m) = x.^(m-1)*t'; end % Calculate A A = zeros(M, M); for i = 1:M for j = 1:M for n = 1:N A(i, j) = A(i, j) + x(n).^(i + j - 2); end end end % Calculate weights w = A\T; end
With the found weights, the complete found structure is encoded.
Are there disadvantages to the previously discussed method? Yes – The weights grow hard as the dimensionality (here: the degree of the polynomial) grows. This is solved by adding the size of the weight vector as additional error term and this technique is called shrinkage. Another problem is that the dimensionality is specified explicitely. There are other methods which do not have the dimensionality as parameter.
What are the applications now we have found a fitting curve? Now we can for example calculate the expected minimum and maximum of the dataset. We can also find out where the function is steep (and where not) and many more questions can be answered. Although some mathematics were required to find the answer, the answer is generalizable and works for specific datasets. Good luck with your curve fitting projects! And if you have any questions, feel free to contact me.
As an exercise for you: suppose we have two datapoints: and . Find a polynomial of degree 1 () which fits the data as good as possible. Implement it in your favorite programming language. (Hint: the answer is ). If this was too easy for you: try to generate datapoints by a function you know and add some noise to it. Find a polynomial of degree that fits your data.