Understanding the Bias-Variance Trade-off
George Box once said, “All models are wrong, but some are useful.” From a supervised machine learning perspective, all models have errors, and to make our models useful, we have to minimize such errors. More specifically, we have to minimize two major sources of error: bias and variance.
Prior to applying a machine learning algorithm, we have to make certain assumptions about our model; for instance, we can assume linearity between our features and target in linear regression. In most cases, the actual relationship is much more complicated. Therefore, the simplification of such assumption leads to the first source of error known as “bias.” To remember this concept, think about how bias in real life can be explained as “prejudice against a person”, which is very similar to how bias in machine learning can be explained as “prior assumption against other assumptions (non-linearity for example).”
While bias ties to our assumption, variance refers to a different outlook if we were to use a different training set. Variance is, therefore, defined as the amount by which our model would change had a different training set were to be used.
In the end, our job is to build a robust model; therefore, we want a model with the lowest variance and bias. This, however, is not an easy task as there is a trade-off between bias and variance. A model with low bias tends not to be too simple and a model with low variance tends not to be too complicated. In short, since there is a conflict in simultaneously attempting to minimize both errors, we have to find the agreeing point in the middle of the two bias and variance curves, where both errors are correspondingly reduced for the lowest possible error on the test set, which is what we desire.
By: Nguyen Cao