Bias Variance and Tradeoff
Meaning of Bias and Variance?
Bias: how unfair is something towards other
Variance: how likely something changes with respect to others
Thing to remember?
High Bias: under fitting
High Variance: over fitting
What is High Bias and High Variance?
Assume there are two examiners determining weather forecast.
Where it rains only when it is humid and does not when it is hot, windy or freezing.
You ask Examiner A (Despite of his training, he is too favored towards rain because he loves rain) and Examiner B (a bookworm who completely remembers the training he had):
|Questions (Conditions)||Examiner A||Examiner B|
|Sir, it’s extremely hot today, will it rain?||Yes||No|
|Sir, its little windy out here, will it rain?||No||No|
|Sir, its freezing will it rain?||Yes||No|
|Sir, it’s humid out here, will it rain||No||Yes|
During the test, Examiner A is unable to predict most of them correctly as he is biased towards rainfall. This condition is called under fitting.
Examiner B successfully predicted whether it will rain or not. But being a book worm, he is unknown to the conditions not described in the book during training.
Now, we ask Examiner B:
Me: Sir, there is a giant sitting on the cloud who lost his candy. Will it rain ?
Examiner B: Not sure, since the answer is “No” to most of the conditions, there is a high possibility that it will not rain .
Now, although the decision of Examiner B varies perfectly with the input conditions, he is not able to predict for the new and unseen condition (other general conditions apart from the given specific conditions while training). This condition is called over fitting.
What do we need?
We neither need high bias nor high variance. We would want our algorithm to perform better on training set and also offer best result on unseen data. In general, having high bias reduces the performance of the algorithm on training set while having high variance reduces performance on unseen data. This is known as Bias Variance Trade off.
By: Aditya Pal