Ridge Regression, Regression: Supervised Machine Learning

Higher Order Polynomial

Higher-order polynomial regression allows for modeling complex relationships between the independent variable and the dependent variable. This approach can capture nonlinear trends that linear regression might miss but also runs the risk of overfitting if the degree is too high.

Python Code Example

# Import Libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Generate Sample Data
np.random.seed(42)
X = np.linspace(0, 10, 100).reshape(-1, 1)
y = 3 * X.ravel() + np.sin(2 * X.ravel()) * 5 + np.random.normal(0, 1, 100)

# Split the Dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Function to create and evaluate model
degree = 3  # Change this value for different polynomial degrees
poly = PolynomialFeatures(degree=degree)
X_poly_train = poly.fit_transform(X_train)
X_poly_test = poly.transform(X_test)

model = LinearRegression()
model.fit(X_poly_train, y_train)

y_pred = model.predict(X_poly_test)

mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f'\nDegree {degree}:')
print(f'Mean Squared Error: {mse:.2f}')
print(f'R-squared: {r2:.2f}')
print(f'Coefficients: {model.coef_}')

# Plot the Results
plt.figure(figsize=(10, 6))
plt.scatter(X, y, color='blue', alpha=0.5, label='Data Points')
X_grid = np.linspace(0, 10, 1000).reshape(-1, 1)
y_grid = model.predict(poly.transform(X_grid))
plt.plot(X_grid, y_grid, color='red', linewidth=2, label=f'Fitted Polynomial (Degree {degree})')
plt.title(f'Polynomial Regression (Degree {degree})')
plt.xlabel('X')
plt.ylabel('Y')
plt.legend()
plt.grid(True)
plt.show()

Polynomial Regression of Degree 3

In this example, we set the polynomial degree to 3. This allows us to model the relationship between our generated data points in a way that can capture some of the underlying trends effectively.

# Function to create and evaluate model
degree = 3  # Degree of the polynomial

Output:
Model Coefficients: [ 0., 0.18306778, 0.51199233, -0.02726728 ] are very small

Polynomial Regression of Degree 6

To explore a more complex relationship, we can increase the degree to 6.

# Function to create and evaluate model
degree = 6  # Degree of the polynomial

Output:
Model Coefficients: [ 0.00000000e+00, -6.83739429e+00, 5.74000805e+00, -1.99306116e+00, 3.95724255e-01, -3.92559946e-02, 1.48453341e-03 ] are large

Polynomial Regression of Degree 12

For an even more complex fit, we can use a polynomial degree of 12.

# Function to create and evaluate model
degree = 12  # Degree of the polynomial

Output:
Model Coefficients: [ 0.00000000e+00, 6.75571349e+01, -1.72887982e+02, 2.52401492e+02, -2.23210296e+02, 1.21467320e+02, -4.18694825e+01, 9.38693129e+00, -1.38544413e+00, 1.33478833e-01, -8.07459293e-03, 2.78272006e-04, -4.16589951e-06 ] are very large

By varying the polynomial degree, we can observe how the model fits the data, balancing the need for complexity with the risk of overfitting. The coefficients change as the degree increases, reflecting the model's adaptation to capture the underlying patterns in the data more closely.

Ridge Regression

Ridge regression, also known as L2 regularization, is a linear regression technique that incorporates a penalty term in the ordinary least squares (OLS) loss function. This penalty helps to prevent overfitting, especially in situations where multicollinearity (correlation among independent variables) is present.

The ridge regression loss function can be expressed as:

Loss = Σ(yi - ŷi)^2 + λ * Σ(wj^2)

where:

yi is the actual value,
ŷi is the predicted value,
wj represents the coefficients,
λ (lambda) is the regularization parameter.

In this equation:

The term Σ(yi - ŷi)^2 is the Ordinary Least Squares (OLS) part, which represents the sum of squared residuals (the differences between observed and predicted values).
The term λ * Σ(wj^2) is the L2 penalty term, which adds the penalty for the size of the coefficients.

Key Concepts

Ordinary Least Squares (OLS):
In standard linear regression, the goal is to minimize the sum of squared residuals. The loss function for OLS is the sum of squared errors.
Adding L2 Penalty:
Ridge regression modifies the OLS loss function by adding an L2 penalty term, which is the sum of the squares of the coefficients multiplied by the regularization parameter (lambda). This penalty stabilizes coefficient estimates.
Regularization Parameter (λ):
The value of lambda controls the strength of the penalty. A larger lambda increases the penalty on the size of the coefficients, leading to more regularization, while a smaller lambda allows for larger coefficients, approaching the OLS solution. When lambda is zero, ridge regression becomes equivalent to ordinary least squares.

Coefficients in L2 Regularization (Ridge Regression)

Penalty Term: The L2 penalty term is the sum of the squares of the coefficients.

Equation: Loss = Σ(yi - ŷi)^2 + λ * Σ(wj^2)
Effect on Coefficients: L2 regularization shrinks the coefficients uniformly, preventing them from becoming excessively large. However, it rarely drives them to exactly zero.
Usage: This technique is useful for addressing multicollinearity and generally results in smaller, more stable coefficients.
Pattern in Coefficient Plotting: In coefficient plots for L2 regularization, all coefficients are reduced smoothly as the regularization parameter increases, without any coefficients dropping out entirely.
As λ Approaches Zero: When lambda is zero, the model behaves like ordinary least squares (OLS) regression, where coefficients can take on large values.
As λ Approaches Infinity: As lambda moves towards infinity, all coefficients approach zero, causing the model to underfit the data by becoming overly simplistic.

Ridge Regression Example

Ridge regression is a technique that applies L2 regularization to linear regression, which helps mitigate overfitting by adding a penalty term to the loss function. This example uses a polynomial regression approach with ridge regression to demonstrate how to model complex relationships while controlling for overfitting.

Python Code Example

1. Import Libraries

import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error, r2_score

This block imports the necessary libraries for data manipulation, plotting, and machine learning.

2. Generate Sample Data

np.random.seed(42)  # For reproducibility
X = np.linspace(0, 10, 100).reshape(-1, 1)
y = 3 * X.ravel() + np.sin(2 * X.ravel()) * 5 + np.random.normal(0, 1, 100)

This block generates sample data representing a relationship with some noise, simulating real-world data variations.

3. Split the Dataset

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

This block splits the dataset into training and testing sets for model evaluation.

4. Create Polynomial Features

degree = 12  # Change this value for different polynomial degrees
poly = PolynomialFeatures(degree=degree)
X_poly_train = poly.fit_transform(X_train)
X_poly_test = poly.transform(X_test)

This block generates polynomial features from the training and testing datasets, allowing the model to capture non-linear relationships.

5. Create and Train the Ridge Regression Model

model = Ridge(alpha=1.0)  # Alpha is the regularization strength
model.fit(X_poly_train, y_train)

This block initializes the ridge regression model and trains it using the polynomial features derived from the training dataset.

6. Make Predictions

y_pred = model.predict(X_poly_test)

This block uses the trained model to make predictions on the test set.

7. Plot the Results

plt.figure(figsize=(10, 6))
plt.scatter(X, y, color='blue', alpha=0.5, label='Data Points')
X_grid = np.linspace(0, 10, 1000).reshape(-1, 1)
y_grid = model.predict(poly.transform(X_grid))
plt.plot(X_grid, y_grid, color='red', linewidth=2, label=f'Fitted Polynomial (Degree {degree})')
plt.title(f'Ridge Regression (Polynomial Degree {degree})')
plt.xlabel('X')
plt.ylabel('Y')
plt.legend()
plt.grid(True)
plt.show()

Output with alpha = 0.1:

Output with alpha = 1000:

This block creates a scatter plot of the actual data points versus the predicted values from the ridge regression model, visualizing the fitted polynomial curve.

This structured approach demonstrates how to implement and evaluate ridge regression with polynomial features. By controlling for overfitting through L2 regularization, ridge regression effectively models complex relationships in data, enhancing the robustness of predictions while retaining interpretability.