Running a Lasso Regression Analysis
F.elicià
Posted on November 10, 2022
What is Lasso Regression
Lasso regression is a regularization technique. It is used over regression methods for a more accurate prediction .This model uses shrinkage. Shrinkage is where data values are shrunk towards a central point as the mean .The lasso procedure encourages simple ,sparse models (i.e models with fewer parameters). This particular type of regression is well-suited for models showing high levels of multicollinearity or when you want to automate certain parts of model selection like variable selection/parameter elimination.
Lasso Meaning
The word "LASSO" stands for Least Absolute Shrinkage and Selection Operator .It is a statical formula for the regularization of data models and feature selection.
Regularization
Regularization is an important concept used to avoid overfitting the data ,especially When the trained and test data are much varying .Regularization is implemented by adding a "Penalty" term to the best fit derived from the trained data ,to achieve a lesser variance with the tested data and also restricts the influence of predictor variables over the output variable by compressing their coefficient .In regularization ,What we do is normally keep the same number of features but reduce the magnitude of the coefficient by using different types of regression techniques that uses regularization to overcome this problem.
Lasso Regularization Technique
There are two main regularization techniques ,namely Ridge Regression and Lasso Regression .They both differ in the way they assign a penalty to the coefficients .In this blog ,we will try to understand more about Lasso Regularization technique.
L1 Regularization
If a regression model uses the L1 Regularization technique ,then it is called Lasso Regression .If it used the L2 regularization technique ,it's called Ridge regression .We will study more about these in the later section.L1 regularization adds a penalty that is equal to the absolute value of the magnitude of the co-efficients. Some coefficient might become zero and eliminated from the model. Larger penalties results in co-efficient values that are closer to zero(ideal for producing simpler models).On the other hands ,L2 regularization does not result in any elimination of sparse models or coefficients. Thus,Lasso Regression is easier to interpret as compared to the Ridge.
Mathematical equation of Lasso Regression Residual Sum of Square +λ*(Sum of the absolute value of the magnitude of coefficient)
Where ,
λ denotes the amount of shrinkage.
λ=0 implies all features are considered and it is equivalent to the linear regression where only the residual sum of squares is considered to build a predictive model.
λ=♾️ implies no feature is considered i.e as λ closes to infinity it eliminates more and more features.
The bias increases with an increase in λ variance increases with a decrease in λ
Lasso Regression Example
import numpy as np
Creating a new tree and validation dataset
from sklearn.model_selection import train_test_split
data_train, data_val = train_test_split(new_data_train, test_size = 0.2, random_state=2)
Classifying predictor and Target
#classifying independent and Dependent feature
#_____________________________________________
#Dependent Variable
Y_train = data_train.iloc[:, -1].values
#Independent Variables
X_train = data_train.iloc[:,0 : -1].values
#Independent Variables for Test Set
X_test = data_val.iloc[:,0 : -1].values
Evaluating the Model with RMLSE
def score(y_pred, y_true):
error = np.square(np.log10(y_pred +1) - np.log10(y_true +1)).mean() ** 0.5
score = 1 - error
return score
actual_cost = list(data_val['COST'])
actual_cost = np.asarray(actual_cost)
Building the Lasso Regressor
#Lasso Regression
from sklearn.linear model import Lasso
#Initializing the Lasso Regressor with Normalization Factor as True
lasso_reg = Lasso(normalize=True)
#Fitting the Training data to the Lasso regressor
lasso_reg.fit(X_train,Y_train)
#Predicting for X_test
y_pred_lass =lasso_reg.predict(X_test)
#Printing the Score with RMLSE
print("\n\nLasso SCORE : ", score(y_pred_lass, actual_cost))
Output
0.7335508027883148
The lasso regression has attend 73 percent accuracy with the given data-set
Posted on November 10, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 30, 2024