Secrets of Regression Evaluation - A Guide to Common Metrics
Harsimranjit Singh
Posted on March 25, 2024
Yesterday, we discussed the inner working of Ordinary Least Squares (OLS) in simple Linear Regression. Today, we will shift our focus to evaluating the performance of these models.
Today's Focus: Regression Metrics-Evaluating Model Performance
Why Evaluation Metrics Matter
When we build a model, we aim to predict outcomes accurately. Evaluation metrics help us quantify how well our model is performing.
Understanding Regression Metrics
The Big Picture: These metrics offer a statistical assessment of the difference between predicted values (Yi_hat) in our data. By analyzing these metrics, we gain valuable knowledge about the strengths a weaknesses of our model.
Choosing the right metric: There's no "one size fits all" answer. The ideal metric depends on your specific problem and context.
Multiple metrics for a holistic view: Often, it's beneficial to use a combination of metrics to get a better view of the model's performance.
Regression Metrics:
Now, let's explore some of the most common regression metrics.
1. Mean Absolute Error (MAE):
- Formula:
Interpretation: MAE measures the average absolute difference between the actual and predicted values. It provides idea of how Wrong the prediction is
Advantage:
- The unit of output is the same.
- Robust to outlier (not sensitive)
Disadvantage:
- The graph is not differentiable at 0.
2. Mean Squared Error (MSE):
Formula:
Interpretation: MSE squares the error before averaging, giving more weight to the larger error. It's sensitive to outliers.
Advantages:
- Loss Function differentiable at 0.
- Useful for further analysis.
Disadvantages:
- Sensitive to outliers.
- Units are not the same as the target variable
3. Root Mean Squared Error (RMSE):
Formula:
Interpretation: RMSE is the square root of MSE, bringing the error back to the original units of the target variables. This makes RMSE easier to interpret compared to MSE.
Advantages:
- Easier to interpret than MSE due to units being the same as of target variable
Disadvantages:
- Sensitive to outliers.
4. R2 Score:
Formula:
Interpretation: R-squared is a statistical measure that represents the proportion of variance in the dependent variable (target) that is explained by the independent variables(input) in the model.
Understanding Variance:
Think of it as difference: Variance is all about how much things differ from each other. In this case, we are looking at how much the actual values of dependent variables differ from what our model predicts.
- The higher the R2 Score more the performance of the model.
Advantages:
- Easy to interpret as proportion.
Disadvantages:
- Can be misleading with complex models with many input columns.
Conclusion
Today, we talked about different evaluation metrics, equipping ourselves with valuable tools to assess the performance of our linear regression models. Remember, there's no single "perfect" metric – the best choice depends on your problem and data.
Stay tuned for the next topics.
Posted on March 25, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.