Prediction using Supervised ML
yaswanthteja
Posted on July 7, 2022
- Predict the percentage of marks of an student based on the number of study hours.
- This is a simple linear regression task as it involves just 2 variables.
- Data can be found at clickhere
- You can use R, Python, SAS Enterprise Miner or any other tool.
- What will be predicted score if a student studies for 9.25 hrs/ day?
Demo
Prediction using Supervised Machine Learning
In this regression task I tried to predict the percentage of marks that a student is expected to score based upon the number of hours they studied.
This is a simple linear regression task as it involves just two variables.
Importing the required libraries
# Importing the required libraries
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
Reading the data from source
# Reading data from remote link
url = "https://raw.githubusercontent.com/AdiPersonalWorks/Random/master/student_scores%20-%20student_scores.csv"
s_data = pd.read_csv(url)
print("Data import successful")
s_data.head(10)
Step 2 - Input data Visualization
# Plotting the distribution of scores
s_data.plot(x='Hours', y='Scores', style='o')
plt.title('Hours vs Percentage')
plt.xlabel('Hours Studied')
plt.ylabel('Percentage Score')
plt.show()
From the graph we can safely assume a positive linear relation between the number of hours studied and percentage of score.
Step 3 - Data Preprocessing
This step involved division of data into "attributes" (inputs) and "labels" (outputs).
X = s_data.iloc[:, :-1].values
y = s_data.iloc[:, 1].values
Step 4 - Model Training
Splitting the data into training and testing sets, and training the algorithm.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
regressor = LinearRegression()
regressor.fit(X_train.reshape(-1,1), y_train)
print("Training complete.")
Step 5 - Plotting the Line of regression
Now since our model is trained now, its the time to visualize the best-fit line of regression.
# Plotting the regression line
line = regressor.coef_*X+regressor.intercept_
# Plotting for the test data
plt.scatter(X, y)
plt.plot(X, line,color='red');
plt.show()
Step 6 - Making Predictions
Now that we have trained our algorithm, it's time to test the model by making some predictions.
For this we will use our test-set data
# Testing data
print(X_test)
# Model Prediction
y_pred = regressor.predict(X_test)
Step 7 - Comparing Actual result to the Predicted Model result
# Comparing Actual vs Predicted
df = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})
df
#Estimating training and test score
print("Training Score:",regressor.score(X_train,y_train))
print("Test Score:",regressor.score(X_test,y_test))
Plotting the Bar graph to depict the difference between the actual and predicted value
# Plotting the Bar graph to depict the difference between the actual and predicted value
df.plot(kind='bar',figsize=(5,5))
plt.grid(which='major', linewidth='0.5', color='red')
plt.grid(which='minor', linewidth='0.5', color='blue')
plt.show(
Testing the model with our own data
# Testing the model with our own data
hours = 9.25
test = np.array([hours])
test = test.reshape(-1, 1)
own_pred = regressor.predict(test)
print("No of Hours = {}".format(hours))
print("Predicted Score = {}".format(own_pred[0]))
Step 8 - Evaluating the model
The final step is to evaluate the performance of algorithm. This step is particularly important to compare how well different algorithms perform on a particular dataset. Here different errors have been calculated to compare the model performance and predict the accuracy.
from sklearn import metrics
print('Mean Absolute Error:',metrics.mean_absolute_error(y_test, y_pred))
print('Mean Squared Error:', metrics.mean_squared_error(y_test, y_pred))
print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(y_test, y_pred)))
print('R-2:', metrics.r2_score(y_test, y_pred))
Mean Absolute Error: 4.183859899002975
Mean Squared Error: 21.598769307217406
Root Mean Squared Error: 4.647447612100367
R-2: 0.9454906892105355
R-2 gives the score of model fit and in this case we have R-2 = 0.9454906892105355 which is actually a great score for this model.
I was successfully able to carry-out Prediction using Supervised ML task and was able to evaluate the model's performance on various parameters.
Posted on July 7, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.