Linear Regression, Regression: Supervised Machine Learning
Harsh Mishra
Posted on July 13, 2024
What is Regression?
Definition and Purpose
Regression is a statistical method used in machine learning and data science to understand relationships between variables. It involves modeling the relationship between a dependent variable (target) and one or more independent variables (predictors). The main purpose of regression is to predict or estimate the value of the dependent variable based on the values of the independent variables.
Key Objectives:
- Prediction: Forecasting future values based on historical data.
- Estimation: Determining the strength and form of the relationship between variables.
- Understanding Relationships: Identifying which independent variables are significant predictors of the dependent variable.
Types of Regression
1. Linear Regression
-
Simple Linear Regression: Models the relationship between two variables by fitting a linear equation to observed data.
-
Equation:
y = mx + b
-
Purpose: Predicts the dependent variable
y
based on the independent variablex
.
-
Equation:
-
Multiple Linear Regression: Extends simple linear regression to include multiple independent variables.
-
Equation:
y = b0 + b1x1 + b2x2 + ... + bnxn
-
Purpose: Predicts the dependent variable
y
based on several independent variablesx1
,x2
, ...,xn
.
-
Equation:
2. Polynomial Regression
-
Description: Models the relationship between the dependent and independent variables as an nth degree polynomial.
-
Equation:
y = b0 + b1x + b2x^2 + ... + bnx^n
- Purpose: Captures the non-linear relationship between variables.
-
Equation:
Ordinary Least Squares (OLS) Method
OLS is a method for estimating the unknown parameters in a linear regression model. It minimizes the sum of the squared differences between observed and predicted values.
Equation: The linear model for OLS can be represented as:
y = w0 + w1x1 + w2x2 + ... + wnxn
where:
-
y
is the dependent variable -
x1, x2, ..., xn
are the independent variables -
w0, w1, w2, ..., wn
are the coefficients (parameters) to be estimated
Objective: Minimize the cost function:
Cost(OLS) = Σ(yi - ŷi)^2
where:
-
yi
is the actual value -
ŷi
is the predicted value
Cost Function and Loss Minimization in Linear Regression
Cost Function
The cost function in linear regression quantifies the error between the predicted values and the actual values of the dependent variable. It measures how well the model's predictions align with the actual data. The most commonly used cost function in linear regression is the Mean Squared Error (MSE), but there are other cost functions that can also be applied.
1. Mean Squared Error (MSE):
The MSE is the average of the squared differences between the actual and predicted values. It is defined as:
MSE = (1/n) * Σ (yi - ŷi)^2
where:
-
n
is the number of data points, -
yi
is the actual value, -
ŷi
is the predicted value.
The MSE penalizes larger errors more significantly due to the squaring of the differences, making it sensitive to outliers. The goal of linear regression is to find the model parameters (coefficients) that minimize this cost function.
2. Root Mean Squared Error (RMSE):
The RMSE is the square root of the MSE, providing an error metric in the same units as the dependent variable. It is defined as:
RMSE = √(MSE)
This metric is also sensitive to outliers and is commonly used for model evaluation.
3. Mean Absolute Error (MAE):
The MAE measures the average magnitude of the errors in a set of predictions, without considering their direction (i.e., whether the predictions are above or below the actual values). It is defined as:
MAE = (1/n) * Σ |yi - ŷi|
The MAE is less sensitive to outliers compared to MSE and RMSE, making it a robust alternative for certain datasets.
Loss Minimization (Optimization)
Loss minimization involves finding the values of the model parameters that result in the lowest possible cost function value. This process is also known as optimization. The most common method for loss minimization in linear regression is the Gradient Descent algorithm.
Gradient Descent
Gradient Descent is an iterative optimization algorithm used to minimize the cost function. It adjusts the model parameters in the direction of the steepest descent of the cost function.
Steps of Gradient Descent:
Initialize Parameters: Start with initial values for the model parameters (e.g., coefficients
b0
,b1
, ...,bn
).Calculate Gradient: Compute the gradient of the cost function with respect to each parameter. The gradient is the partial derivative of the cost function.
Update Parameters: Adjust the parameters in the opposite direction of the gradient. The adjustment is controlled by the learning rate (
α
), which determines the size of the steps taken towards the minimum.Repeat: Iterate the process until the cost function converges to a minimum value (or a pre-defined number of iterations is reached).
Parameter Update Rule:
For each parameter bj
:
bj = bj - α * (∂/∂bj) MSE
where:
-
α
is the learning rate -
(∂/∂bj) MSE
is the partial derivative of the MSE with respect tobj
The partial derivative of the MSE with respect to bj
is calculated as:
(∂/∂bj) MSE = -(2/n) * Σ (yi - ŷi) * xij
where:
-
xij
is the value of thej
th independent variable for thei
th data point
Overfitting vs. Underfitting
Overfitting
- Definition: Overfitting occurs when a model learns the training data too well, capturing noise and outliers rather than the underlying pattern. As a result, the model performs exceptionally well on training data but poorly on unseen or validation data.
-
Characteristics:
- High accuracy on training data.
- Poor generalization to new data.
- Complexity of the model is too high (e.g., too many parameters or a very flexible model).
-
Causes:
- Too many features relative to the number of observations.
- Excessive model complexity (e.g., high-degree polynomial regression).
- Insufficient training data.
-
Solutions:
- Use simpler models (reduce complexity).
- Employ regularization techniques (e.g., Lasso, Ridge).
- Use cross-validation to tune hyperparameters.
- Increase training data if possible.
Underfitting
- Definition: Underfitting occurs when a model is too simple to capture the underlying trend in the data. This leads to poor performance on both training and validation datasets.
-
Characteristics:
- Low accuracy on both training and validation data.
- The model fails to learn the relationships in the data.
-
Causes:
- Insufficient model complexity (e.g., linear model for a non-linear relationship).
- Too few features used in the model.
- Poor feature selection or engineering.
-
Solutions:
- Increase model complexity (e.g., use a higher-degree polynomial or more sophisticated algorithms).
- Add relevant features or perform feature engineering.
- Remove overly simplistic assumptions in the model.
Bias-Variance Trade-Off
The bias-variance trade-off is a fundamental concept in machine learning that describes the trade-off between two sources of error that affect the performance of predictive models: bias and variance.
Bias
- Definition: Bias refers to the error due to overly simplistic assumptions in the learning algorithm. It represents the model's inability to capture the underlying patterns of the data.
-
Characteristics:
- High bias can lead to underfitting, where the model is too simple to capture the complexity of the data.
- Models with high bias tend to have consistent errors across different datasets.
- Examples: Linear regression on non-linear data.
Variance
- Definition: Variance refers to the error due to excessive sensitivity to fluctuations in the training data. It captures how much the model's predictions would vary if it were trained on different datasets.
-
Characteristics:
- High variance can lead to overfitting, where the model learns noise and outliers in the training data instead of the underlying distribution.
- Models with high variance perform well on training data but poorly on unseen data.
- Examples: High-degree polynomial regression on a small dataset.
The Trade-Off
- Balancing Act: The challenge in machine learning is to find a model that minimizes both bias and variance. A model with low bias and low variance is ideal but often hard to achieve.
-
Effect of Complexity:
- As model complexity increases, bias decreases and variance increases.
- Conversely, as model complexity decreases, bias increases and variance decreases.
The goal is to achieve a balance where the total error (comprised of bias, variance, and irreducible error due to noise in the data) is minimized. This often involves techniques such as cross-validation, regularization, and careful feature selection to tune the model appropriately for the given dataset.
Simple Linear Regression
Simple linear regression is a statistical method that models the relationship between two variables by fitting a linear equation to observed data. This example uses a simulated dataset to represent the relationship between the size of a house (in square feet) and its price (in thousands of dollars), incorporating natural variations to reflect real-life scenarios.
Python Code Example
1. Import Libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
This block imports the necessary libraries for data manipulation, plotting, and machine learning.
2. Generate Sample Data
np.random.seed(42) # For reproducibility
square_footage = np.array([1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, 2500, 2600, 2700])
price = np.array([300, 320, 340, 360, 380, 400, 420, 440, 460, 480, 500, 520, 540]) + np.random.normal(0, 20, 13) # Adding noise
This block generates sample data for house sizes and prices, introducing random noise to simulate real-world pricing variations.
3. Prepare Features and Target Variables
X = square_footage.reshape(-1, 1) # Square footage
y = price # Price in thousands
This block prepares the features (square footage) and the target variable (house price).
4. Print Features and Target Variables
print("Square Footage (X):", X)
print("House Price (y):", y)
Output:
Square Footage (X): [[1500]
[1600]
[1700]
[1800]
[1900]
[2000]
[2100]
[2200]
[2300]
[2400]
[2500]
[2600]
[2700]]
House Price (y): [309.93428306 317.23471398 352.95377076 390.46059713 375.31693251
395.31726086 451.58425631 455.34869458 450.61051228 490.85120087
490.73164614 510.68540493 544.83924543]
5. Split the Dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
This block splits the dataset into training and testing sets for model evaluation.
6. Create and Train the Model
model = LinearRegression()
model.fit(X_train, y_train)
This block initializes the linear regression model and trains it using the training dataset.
7. Make Predictions
y_pred = model.predict(X_test)
This block uses the trained model to make predictions on the test set.
8. Evaluate the Model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f'Mean Squared Error: {mse:.2f}')
print(f'R-squared: {r2:.2f}')
Output:
Mean Squared Error: 57.99
R-squared: 0.99
9. Plot the Results
plt.scatter(X, y, color='blue', label='Actual Prices')
plt.plot(X_test, y_pred, color='red', linewidth=2, label='Fitted Line')
plt.title('Simple Linear Regression: House Price Prediction')
plt.xlabel('Square Footage (sq ft)')
plt.ylabel('Price (in thousands)')
plt.legend()
plt.grid()
plt.show()
This block creates a scatter plot of the actual prices versus the predicted prices to visualize the fit of the model.
Output:
This structured approach provides a comprehensive understanding of how to implement and evaluate simple linear regression, using a realistic dataset that accounts for variations in housing prices based on square footage.
Multiple Linear Regression
Multiple linear regression is a statistical technique that models the relationship between a dependent variable and multiple independent variables. This example incorporates two features: the size of a house (in square feet) and the number of bathrooms. We analyze how both factors influence house prices.
Python Code Example
1. Import Libraries
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
This block imports the necessary libraries for data manipulation and machine learning.
2. Generate Sample Data
np.random.seed(42) # For reproducibility
square_footage = np.array([1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, 2500, 2600, 2700])
num_bathrooms = np.array([1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5])
price = np.array([300, 320, 340, 360, 380, 400, 420, 440, 460, 480, 500, 520, 540]) + np.random.normal(0, 20, 13) # Adding noise
This block generates sample data for house sizes, number of bathrooms, and prices, introducing random noise to simulate real-world pricing variations.
3. Prepare Features and Target Variables
X = np.column_stack((square_footage, num_bathrooms)) # Features: square footage and number of bathrooms
y = price # Price in thousands
This block prepares the features (square footage and number of bathrooms) and the target variable (house price).
4. Print Features and Target Variables
print("Features (X):", X)
print("House Price (y):", y)
Output:
Features (X): [[1500 1]
[1600 1]
[1700 2]
[1800 2]
[1900 2]
[2000 3]
[2100 3]
[2200 3]
[2300 4]
[2400 4]
[2500 4]
[2600 5]
[2700 5]]
House Price (y): [309.93428306 317.23471398 352.95377076 390.46059713 375.31693251
395.31726086 451.58425631 455.34869458 450.61051228 490.85120087
490.73164614 510.68540493 544.83924543]
5. Split the Dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
This block splits the dataset into training and testing sets for model evaluation.
6. Create and Train the Model
model = LinearRegression()
model.fit(X_train, y_train)
This block initializes the multiple linear regression model and trains it using the training dataset.
7. Make Predictions
y_pred = model.predict(X_test)
This block uses the trained model to make predictions on the test set.
8. Evaluate the Model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f'Mean Squared Error: {mse:.2f}')
print(f'R-squared: {r2:.2f}')
Output:
Mean Squared Error: 64.16
R-squared: 0.99
This structured approach demonstrates how to implement and evaluate multiple linear regression, using a realistic dataset that accounts for variations in housing prices based on both square footage and the number of bathrooms.
Polynomial Regression
Polynomial regression is a regression analysis technique where the relationship between the independent variable and the dependent variable is modeled as an nth degree polynomial. In this example, we will model the relationship between the size of a house (in square feet) and its price (in thousands of dollars) using a 3rd degree polynomial.
Python Code Example
1. Import Libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
This block imports the necessary libraries for data manipulation, plotting, and machine learning.
2. Generate Sample Data
np.random.seed(42) # For reproducibility
square_footage = np.array([1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, 2500, 2600, 2700])
price = np.array([300, 320, 340, 360, 380, 400, 420, 440, 460, 480, 500, 520, 540]) + np.random.normal(0, 20, 13) # Adding noise
This block generates sample data for house sizes and prices, introducing random noise to simulate real-world pricing variations.
3. Prepare Features and Target Variables
X = square_footage.reshape(-1, 1) # Reshape for sklearn
y = price # Price in thousands
This block prepares the features (square footage) and the target variable (house price).
4. Split the Dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
This block splits the dataset into training and testing sets for model evaluation.
5. Create Polynomial Features
poly = PolynomialFeatures(degree=3)
X_poly_train = poly.fit_transform(X_train)
X_poly_test = poly.transform(X_test)
print("Polynomial Features (X_poly_train):", X_poly_train)
print("Polynomial Features (X_poly_test):", X_poly_test)
Output:
Polynomial Features (X_poly_train): [[1.0000e+00 2.3000e+03 5.2900e+06 1.2167e+10]
[1.0000e+00 2.0000e+03 4.0000e+06 8.0000e+09]
[1.0000e+00 1.7000e+03 2.8900e+06 4.9130e+09]
[1.0000e+00 1.6000e+03 2.5600e+06 4.0960e+09]
[1.0000e+00 2.7000e+03 7.2900e+06 1.9683e+10]
[1.0000e+00 1.9000e+03 3.6100e+06 6.8590e+09]
[1.0000e+00 2.2000e+03 4.8400e+06 1.0648e+10]
[1.0000e+00 2.5000e+03 6.2500e+06 1.5625e+10]
[1.0000e+00 1.8000e+03 3.2400e+06 5.8320e+09]
[1.0000e+00 2.1000e+03 4.4100e+06 9.2610e+09]]
Polynomial Features (X_poly_test): [[1.0000e+00 2.6000e+03 6.7600e+06 1.7576e+10]
[1.0000e+00 2.4000e+03 5.7600e+06 1.3824e+10]
[1.0000e+00 1.5000e+03 2.2500e+06 3.3750e+09]]
6. Create and Train the Model
model = LinearRegression()
model.fit(X_poly_train, y_train)
This block initializes the polynomial regression model and trains it using the transformed training dataset.
7. Make Predictions
y_pred = model.predict(X_poly_test)
This block uses the trained model to make predictions on the test set.
8. Evaluate the Model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f'Mean Squared Error: {mse:.2f}')
print(f'R-squared: {r2:.2f}')
Output:
Mean Squared Error: 300.06
R-squared: 0.96
9. Plot the Results
plt.scatter(X, y, color='blue', label='Actual Prices')
X_grid = np.arange(min(X), max(X), 1).reshape(-1, 1)
y_grid = model.predict(poly.transform(X_grid))
plt.plot(X_grid, y_grid, color='red', linewidth=2, label='Fitted Polynomial Curve')
plt.title('Polynomial Regression: House Price Prediction')
plt.xlabel('Square Footage (sq ft)')
plt.ylabel('Price (in thousands)')
plt.legend()
plt.grid()
plt.show()
This block creates a scatter plot of the actual prices versus the predicted prices and visualizes the fitted polynomial curve.
Output:
This structured approach demonstrates how to implement and evaluate polynomial regression using a realistic dataset that captures the non-linear relationship between house size and price. By incorporating polynomial features, we enhance prediction accuracy and better model complex scenarios where simple linear regression may not suffice.
Combined Multiple Linear and Polynomial Regression
In this example, we will implement a combined approach where we use multiple linear regression for the size of the house (in square feet) and polynomial regression for the number of bathrooms, allowing us to model the relationship with price (in thousands of dollars) using polynomial features for the bathroom count up to degree 3.
Python Code Example
1. Import Libraries
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
This block imports the necessary libraries for data manipulation and machine learning.
2. Generate Sample Data
np.random.seed(42) # For reproducibility
square_footage = np.array([1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, 2500, 2600, 2700])
bathrooms = np.array([1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5])
price = np.array([300, 320, 340, 360, 380, 400, 420, 440, 460, 480, 500, 520, 540]) + np.random.normal(0, 20, 13) # Adding noise
This block generates sample data for house sizes, number of bathrooms, and prices, introducing random noise to simulate real-world pricing variations.
3. Prepare Features and Target Variables
X = np.column_stack((square_footage, bathrooms)) # Combine features
y = price # Price in thousands
print("Features (X):", X)
Output:
Features (X): [[1500 1]
[1600 1]
[1700 2]
[1800 2]
[1900 2]
[2000 3]
[2100 3]
[2200 3]
[2300 4]
[2400 4]
[2500 4]
[2600 5]
[2700 5]]
4. Split the Dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
This block splits the dataset into training and testing sets for model evaluation.
5. Create Polynomial Features for Bathrooms
poly = PolynomialFeatures(degree=3, include_bias=False)
X_poly_bathrooms_train = poly.fit_transform(X_train[:, 1].reshape(-1, 1)) # Only bathrooms
X_poly_bathrooms_test = poly.transform(X_test[:, 1].reshape(-1, 1))
# Combine square footage with polynomial features of bathrooms
X_poly_train = np.column_stack((X_train[:, 0], X_poly_bathrooms_train))
X_poly_test = np.column_stack((X_test[:, 0], X_poly_bathrooms_test))
print("Polynomial Features (X_poly_train):", X_poly_train)
print("Polynomial Features (X_poly_test):", X_poly_test)
Output:
Polynomial Features (X_poly_train): [[2.30e+03 4.00e+00 1.60e+01 6.40e+01]
[2.00e+03 3.00e+00 9.00e+00 2.70e+01]
[1.70e+03 2.00e+00 4.00e+00 8.00e+00]
[1.60e+03 1.00e+00 1.00e+00 1.00e+00]
[2.70e+03 5.00e+00 2.50e+01 1.25e+02]
[1.90e+03 2.00e+00 4.00e+00 8.00e+00]
[2.20e+03 3.00e+00 9.00e+00 2.70e+01]
[2.50e+03 4.00e+00 1.60e+01 6.40e+01]
[1.80e+03 2.00e+00 4.00e+00 8.00e+00]
[2.10e+03 3.00e+00 9.00e+00 2.70e+01]]
Polynomial Features (X_poly_test): [[2.60e+03 5.00e+00 2.50e+01 1.25e+02]
[2.40e+03 4.00e+00 1.60e+01 6.40e+01]
[1.50e+03 1.00e+00 1.00e+00 1.00e+00]]
6. Create and Train the Model
model = LinearRegression()
model.fit(X_poly_train, y_train)
This block initializes the combined regression model and trains it using the transformed training dataset.
7. Make Predictions
y_pred = model.predict(X_poly_test)
This block uses the trained model to make predictions on the test set.
8. Evaluate the Model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f'Mean Squared Error: {mse:.2f}')
print(f'R-squared: {r2:.2f}')
Output:
Mean Squared Error: 199.75
R-squared: 0.98
This structured approach effectively combines multiple features with polynomial transformations, providing a comprehensive understanding of how to implement and evaluate the model.
Evaluating Linear Regression Model
Evaluating a linear regression model involves assessing how well it predicts the dependent variable using various metrics and techniques. Here are some key methods for evaluation:
1. Performance Metrics
-
Mean Squared Error (MSE):
Measures the average squared difference between predicted and actual values. Lower values indicate better model performance.
- Formula:
MSE = (1/n) * Σ (yi - ŷi)^2
- Formula:
from sklearn.metrics import mean_squared_error
mse = mean_squared_error(y_test, y_pred)
print(f'Mean Squared Error: {mse}')
-
Root Mean Squared Error (RMSE):
The square root of MSE, providing an error metric in the same units as the dependent variable. It is also sensitive to outliers.
- Formula:
RMSE = √(MSE)
- Formula:
import numpy as np
rmse = np.sqrt(mse)
print(f'Root Mean Squared Error: {rmse}')
-
Mean Absolute Error (MAE):
Measures the average absolute differences between predicted and actual values. It is less sensitive to outliers than MSE.
- Formula:
MAE = (1/n) * Σ |yi - ŷi|
- Formula:
from sklearn.metrics import mean_absolute_error
mae = mean_absolute_error(y_test, y_pred)
print(f'Mean Absolute Error: {mae}')
2. Cross-Validation
Cross-validation is a robust technique for assessing the performance of a machine learning model by splitting the dataset into multiple parts and validating the model on different subsets of the data. Here are common cross-validation techniques:
-
K-Fold Cross-Validation:
The dataset is split into
k
subsets. The model is trained onk-1
subsets and validated on the remaining subset. This process is repeatedk
times, each time with a different subset as the validation set. The average performance metric over thek
folds provides a more reliable evaluation.
from sklearn.model_selection import KFold, cross_val_score
kf = KFold(n_splits=5, shuffle=True, random_state=42)
scores = cross_val_score(model, X, y, cv=kf, scoring='neg_mean_squared_error')
print(f'Cross-Validation MSE: {np.mean(-scores)}')
-
Leave-One-Out Cross-Validation (LOOCV):
A special case of K-Fold cross-validation where
k
equals the number of data points. Each data point is used once as a validation set, and the remaining data points are used for training. This method is computationally intensive but useful for small datasets.
from sklearn.model_selection import LeaveOneOut
loo = LeaveOneOut()
scores = cross_val_score(model, X, y, cv=loo, scoring='neg_mean_squared_error')
print(f'Leave-One-Out Cross-Validation MSE: {np.mean(-scores)}')
- Stratified K-Fold Cross-Validation: Similar to K-Fold cross-validation but ensures that each fold is representative of the overall class distribution. This method is particularly useful for imbalanced datasets.
from sklearn.model_selection import StratifiedKFold
skf = StratifiedKFold(n_splits=5)
scores = cross_val_score(model, X, y, cv=skf, scoring='neg_mean_squared_error')
print(f'Stratified K-Fold Cross-Validation MSE: {np.mean(-scores)}')
By using these evaluation methods and cross-validation techniques, practitioners can assess the effectiveness of their linear regression model, ensuring it generalizes well to unseen data.
Regularization in Regression
Regularization is a technique used in regression analysis to prevent overfitting and improve model generalization by adding a penalty term to the loss function. This penalty discourages overly complex models by constraining the size of the coefficients, which helps manage the bias-variance tradeoff. The two most common forms of regularization in regression are L1 regularization (Lasso) and L2 regularization (Ridge).
L2 Regularization (Ridge Regression)
Concept: L2 regularization adds a penalty equal to the square of the magnitude of coefficients to the loss function. This is known as the L2 norm.
Loss Function: The modified loss function for Ridge regression can be represented as:
Loss = Σ(yi - ŷi)^2 + λ * Σ(wj^2)
Where:
-
yi
is the actual value. -
ŷi
is the predicted value. -
wj
are the model coefficients. -
λ
is the regularization parameter that controls the strength of the penalty.
Effects:
- Ridge regression shrinks the coefficients towards zero but does not set them exactly to zero. As a result, all features remain in the model, making it suitable for situations with many predictors, especially when multicollinearity is present.
- The quadratic penalty means that larger coefficients are penalized more heavily, promoting stability in predictions.
Coefficient Plotting: When visualizing coefficients, Ridge regression shows a smooth decrease in coefficient values as the regularization parameter increases, resulting in more balanced coefficients without dropping any variables.
L1 Regularization (Lasso Regression)
Concept: L1 regularization adds a penalty equal to the absolute value of the magnitude of coefficients to the loss function, known as the L1 norm.
Loss Function: The modified loss function for Lasso regression is expressed as:
Loss = Σ(yi - ŷi)^2 + λ * Σ|wj|
Where:
-
yi
is the actual value. -
ŷi
is the predicted value. -
wj
are the model coefficients. -
λ
is the regularization parameter.
Effects:
- Lasso regression can shrink some coefficients to exactly zero, effectively performing variable selection. This is beneficial in creating simpler, more interpretable models.
- The linear penalty allows for certain coefficients to be excluded from the model, which can be especially useful when dealing with high-dimensional data.
Coefficient Plotting: In Lasso regression, as the regularization parameter increases, we typically observe that some coefficients drop to zero quickly, creating a sparse model where only the most significant features retain non-zero coefficients.
Posted on July 13, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
July 16, 2024
July 15, 2024