I feel bias and variance are some of the trickiest concepts to get a solid understanding of. It was explained to me a bunch of times and every single time after a couple of weeks I found myself thinking “Which one was which again?”
The reason it took me a while to grasp them correctly was that I was trying to learn them by heart. I didn’t really understand the logic of it.
So today, I’m on a mission to explain it to you in a practical way in hopes that this might be the last time you might need someone to explain it to you!
Bias is how much bias or prejudices your model has against the problem you’re trying to frame. The more assumptions your model is making while trying to fit your data, the higher the bias.
This can be caused by the type of model you’re using or some of the hyperparameters you might have set. For example, linear modelling techniques like linear regression have high bias because they assume that the solution is linear.
Because of these assumptions, your model might entirely miss the pattern of your data/problem. That’s when underfitting happens.
Variance on the other hand is the sensitivity of the model to the data itself. It points at how much the model would have changed if a different dataset referring to the same problem was used to train it.
Models which have high flexibility tend to have high variance, such as decision trees. When left to fit the data as much as they can without any control mechanism, they fit all data points, including the noise. You can think of these algorithms as overestimating how much the data at hand represents the real world and being left unable to generalize to data they haven't seen before.
As you might figure out from this definition, when you have high variance, it means your model is overfitting the data.
Let’s say the correct predictions we want to make are inside the innermost circle of these three circles. If we have high bias and low variance, we would have the predictions marked with the orange pluses on the left. Having the correct amount of scatter but not located in the correct place.
Whereas in a high variance and low bias case, we would have the predictions located closer to where they’re supposed to be, but scattered around like in the diagram on the right.
With traditional machine learning algorithms there has always been a war between lowering either the bias or the variance. That is trying to fit the data well enough but not too much that you end up overfitting it. This is called the bias-variance trade-off in machine learning as lowering bias might cause the variance to go up and vice versa.
This does not have to be the case anymore though, especially with deep learning algorithms.
Here are some solutions to high variance or high bias problems.
When working with deep learning models, if you have a model that is underfitting, you can increase the model complexity (add hidden layers/neurons) and make sure to use regularization. That way you would lower the bias without causing the variance to shoot up.
You can also introduce more data to your network to tackle overfitting without causing your model to have high bias.
In my upcoming course on deep learning, we will talk more about these solutions to high bias and high variance cases. We will not only stop there and will go ahead and implement those solutions to get a sense of how you can identify when overfitting happens and what you should do when it happens.
If you'd like to learn more about the course and be informed of the updates on the course, you can sign up here.