Piecewise Linear Regression
Real world data is not always linear. Many cases it is very difficult to fit a line and get an perfect model on non linear and non monotonic datasets. While one can resort to complex models like SVM, Trees or even Neural Network, it comes with cost of interpret-ability and explain-ability
Is there a middle ground that can be used when decision boundary are not very complex?. Answer is in title.
Piecewise regression breaks data into individual segments and fits a linear regression within each segment. Location where one segment ends and other begins are called break points.
Let’s take a very simple dataset for illustration below and visualize output of Linear and Piecewise linear regression.
Refer to my repo for code on piecewise regression and plots above – https://github.com/srivatsan88/piecewise-regression/blob/master/piecewise_linear_regression.ipynb
If you check plot above linear fit results in larger standard error compared to piecewise fit. Piecewise plot above might look to be overfitting, while it is not. This technique generalizes well on new data points. In this case we segment the data point to 3 buckets and fit regression line within each segment
Piecewise works by finding optimal set of breakpoints that minimizes sum of square error. Within break point least square fit is used that minimizes sum of squared error. In case of problem with large number of segments multi start gradient based search is used to speed up detection of optimal break points.
Piecewise linear function can reduce model bias by segmenting on key decision variables and is used in highly regulated business cases like credit decisions and risk based simulation where model explain-ability is mandatory
By: Srivatsan Srinivasan