Maximum Likelihood Estimation with Logistic Regression
Harsimranjit Singh
Posted on June 27, 2024
In our previous article, we introduce logistic regression, a fundamental technique in machine learning used for binary classification. Logistic regression predicts the probability of binary outcomes based on input features.
This article dives into the mathematical foundation of logistic regression.
Understanding Likelihood
Likelihood, refers to the chance of observing a specific outcome or event given a particular model or set of conditions.
Breakdown to understand better:
- Focus on Specific Outcome: Unlike probability, which deals with the general chance of an event happening, likelihood focuses on a specific outcome given something else is true.
- Model-Based: We use the model to calculate the likelihood of observing a specific data set assuming the model's parameters are true.
- Higher Likelihood: Higher the likelihood of the model means parameters are better fit for explaining the data.
Example
Imagine you have a coin that might be biased, and you flip it 5 times, getting the results:
Heads, Tails, Heads, Heads, Tails. You want to estimate the probability θ of getting heads.
1-> Suppose θ = 0.5:
- The likelihood of getting the sequence is: L(0.5) = P(H) x P(T) X P(H) X P(H) X P(T) = 0.5 * 0.5 * 0.5 * 0.5 * 0.5 = 0.03125
2-> Suppose θ = 0.7:
- The likelihood of getting the same sequence is : L(0.7) = P(H) x P(T) X P(H) X P(H) X P(T) = 0.7 * 0.7 * 0.7 * 0.7 * 0.7 = 0.1029
Difference between likelihood and probability
- Probability: Focuses on the general chance of an event happening in the long run.
- Likelihood: Focuses on the chance of observing a specific outcome given a particular scenario.
Maximum Likelihood Estimation (MLE)
Maximum Likelihood Estimation (MLE) is a method used to estimate the parameters of the statistical model. The goal is to find the parameter values that maximize the likelihood function, with best fitting the observed data.
Step-by-Step
Define the Likelihood Function: The likelihood function L(θ) represents the probability of observing the data as a function of the model parameters θ.
Log-Likelihood Function: For mathematical convenience, we often use the log-likelihood function L(θ), which is the natural log of likelihood function:
L(θ) = log L(θ)Maximize the log-likelihood: Find the parameter values that maximize the log-likelihood function. This involves taking the derivative of the log-likelihood with respect to the parameters and setting it to zero to solve for the parameters.
MLE in Logistic Regression
Logistic regression models the probability of a binary outcome (success/failure) based on input features x.
where Xi are the input features, β are the parameters to be estimated, and yi is the binary outcome.
Log-Likelihood Function:
The likelihood of observing the given data under logistic regression is:
Deriving the MLE for Logistic Regression
To Find the MLE for β, we need to maximize the log-likelihood function. This involves:
- Calculating the Gradient: Compute the derivative of the log-likelihood with respect to β.
- Optimization: Use an optimization algorithm (e.g., gradient descent) to find the parameter values that maximize the log-likelihood.
Practical Implementation
import numpy as np
import scipy.optimize as opt
X = np.array([[1, 2], [1, 3], [1, 4], [1, 5]]) # Adding a column of ones for the intercept
y = np.array([0, 0, 1, 1])
# Sigmoid function
def sigmoid(z):
return 1 / (1 + np.exp(-z))
# Log-likelihood function
def log_likelihood(beta, X, y):
z = np.dot(X, beta)
return -np.sum(y * np.log(sigmoid(z)) + (1 - y) * np.log(1 - sigmoid(z)))
beta_init = np.zeros(X.shape[1])
# Optimization
result = opt.minimize(log_likelihood, beta_init, args=(X, y), method='BFGS')
beta_hat = result.x
print("Parameters", beta_hat)
Conclusion
Understanding the mathematical foundations of logistic regression and maximum likelihood estimation is essential for effectively applying these techniques in machine learning. By maximizing the likelihood function, logistic regression identifies the parameters β that best fit the observed data, enabling accurate predictions of binary outcomes based on input features
Posted on June 27, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 30, 2024