End to End Deployment of Breast Cancer Prediction Through Machine Learning using Flask
Karan Choudhary
Posted on August 9, 2020
Artificial intelligence in healthcare is the use of complex algorithms and software in another words artificial intelligence (AI) to emulate human cognition in the analysis, interpretation, and comprehension of complicated medical and healthcare data. Specifically, AI is the ability of computer algorithms to approximate conclusions without direct human input.
The aim of health-related AI applications is to analyze relationships between prevention or treatment techniques and patient outcomes.
Before starting of the code and theory part of machine learning first we learn about flask and deployment part.
Flask is a web application framework written in Python. It has multiple modules that make it easier for a web developer to write applications without having to worry about the details like protocol management, thread management, etc.
Flask gives is a variety of choices for developing web applications and it gives us the necessary tools and libraries that allow us to build a web application.
Installing Flask on your Machine
Installing Flask is simple and straightforward. Here, I am assuming you already have Python 3 and pip installed. To install Flask, you need to run the following command:
sudo apt-get install python3-flask
pip install flask
That's it! You're all set to dive into the problem statement take one step closer to deploying your machine learning model through flask.
Starting of implementation
Here we have folder structure of the machine learning deployment model through flask.
We will be implementing these code in jupyter and sublime text editor.Implementing the machine learning models lets go for importing library.
import libraries
import pandas as pd # for data manupulation or analysis
import numpy as np # for numeric calculation
import matplotlib.pyplot as plt # for data visualization
import seaborn as sns # for data visualization
import pickle #for dumping the model or we can use joblib library
Now next step is to load the data through pandas.
cancer_df = pd.read_csv('breast_cancer .csv')
Now next step is to see the data frame of the data.
Head of cancer DataFrame
cancer_df.head(6)
Info about the model(gives null value and count the non float values)
Information of cancer Dataframe
cancer_df.info()
Numerical description about the data (mean,median,25%,interquantile range and many other value of each feature.
Numerical distribution of data
cancer_df.describe()
Heatmap
heatmap of DataFrame
plt.figure(figsize=(16,9))
sns.heatmap(cancer_df)
Heatmap of a correlation matrix
cancer_df.corr()#gives the correlation between them
Heatmap of Correlation matrix of breast cancer DataFrame
plt.figure(figsize=(20,20))
sns.heatmap(cancer_df.corr(), annot = True, cmap ='coolwarm', linewidths=2)
plt.figure(figsize=(16,9))
sns.heatmap(cancer_df)# create second DataFrame by droping target
cancer_df2 = cancer_df.drop(['target'], axis = 1)
print("The shape of 'cancer_df2' is : ", cancer_df2.shape
a
plt.figure(figsize=(16,9))
sns.heatmap(cancer_df)cancer_df2.corrwith(cancer_df.target) # visualize correlation barplot
plt.figure(figsize = (16,5))
ax = sns.barplot(cancer_df2.corrwith(cancer_df.target).index, cancer_df2.corrwith(cancer_df.target))
ax.tick_params(labelrotation = 90) # **** img 10 ***
plotSplit DataFrame in Train and Test
Input variable
input variable
X = cancer_df.drop(['target'], axis = 1)
X.head(6)
Output variable
output variable
y = cancer_df['target']
y.head(6)
Split dataset for training and
split dataset into train and test
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state= 5)
Feature scaling of data
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train_sc = sc.fit_transform(X_train)
X_test_sc = sc.transform(X_test)
Machine Learning Model Building
1.Suppor vector Classifier
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score #for classification report
Svm model
from sklearn.svm import SVC
svc_classifier = SVC()
svc_classifier.fit(X_train, y_train)
y_pred_scv = svc_classifier.predict(X_test)
accuarcy_svm=accuracy_score(y_test, y_pred_scv)
print(accuarcy_svm)
Output 0.5789473684210527
2.Logistic Regression
Logistic Regression
from sklearn.linear_model import LogisticRegression
lr_classifier = LogisticRegression(random_state = 51, penalty = 'l1')
lr_classifier.fit(X_train, y_train)
y_pred_lr = lr_classifier.predict(X_test)
accuracy_score(y_test, y_pred_lr)
accuarcy_lr=accuracy_score(y_test, y_pred_lr)
print(accuarcy_lr)
Output 0.9736842105263158
3.Decision Tree Classifier
Decision Tree Classifier
from sklearn.tree import DecisionTreeClassifier
dt_classifier = DecisionTreeClassifier(criterion = 'entropy', random_state = 51)
dt_classifier.fit(X_train, y_train)
y_pred_dt = dt_classifier.predict(X_test)
accuarcy_dt=accuracy_score(y_test, y_pred_dt)
print(accuarcy_dt)
Output 0.9473684210526315
-
XGboost classsifier
XGBoost Classifier
from xgboost import XGBClassifier
xgb_classifier = XGBClassifier()
xgb_classifier.fit(X_train, y_train)
y_pred_xgb = xgb_classifier.predict(X_test)
y_pred_xgb=accuracy_score(y_test, y_pred_xgb)
accuarcy_xgb=accuracy_score(y_test, y_pred_xgb)
print(accuarcy_xgb)Output 0.9823684210526315
Similarly, we have to do for test data and implement them on we can see that their will be no overfitting and underfitting the test data. their should low bias and low variance.
Accuracy on test data and the similar code for test data .accuracy of all the classifier test data
Accuracy of Support vector Classifier - 0.5789456522520
Accuracy of Decision tree Classifier -0.8473684210526315
Accuracy of Logistic regression- 0.570456140350877
Accuracy of XGBoost Classifier - 0.982456140350877
As,we can conclude the test data is performing nearly good result in Xgboost classifier with low bias and variance.
For further improving we should go for the tuning method such as randomised search and grid search on Xgboost because we want our accuracy to be more optimal and fix all contraints like precision ,recall ,beta value and support which are import to satisfy to overcome the Type I and Type II Error.
Randomized search
Applying randomized search on the model which works on sample of data and it works more faster than any search tuning method
params={
"learning_rate" : [0.05, 0.10, 0.15, 0.20, 0.25, 0.30 ] ,
"max_depth" : [ 3, 4, 5, 6, 8, 10, 12, 15],
"min_child_weight" : [ 1, 3, 5, 7 ],
"gamma" : [ 0.0, 0.1, 0.2 , 0.3, 0.4 ],
"colsample_bytree" : [ 0.3, 0.4, 0.5 , 0.7 ]
}Randomized Search
from sklearn.model_selection import RandomizedSearchCV
random_search = RandomizedSearchCV(xgb_classifier, param_distributions=params, scoring= 'roc_auc', n_jobs= -1, verbose= 3)
random_search.fit(X_train, y_train)
Finding the best and optimize parameter.
random_search.best_params_output
{'min_child_weight': 1,
'max_depth': 12,
'learning_rate': 0.3,
'gamma': 0.3,
'colsample_bytree': 0.7}
random_search.best_estimator_output
XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=0.7, gamma=0.3,
learning_rate=0.3, max_delta_step=0, max_depth=12,
min_child_weight=1, missing=None, n_estimators=100, n_jobs=1,
nthread=None, objective='binary:logistic', random_state=0,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
silent=None, subsample=1, verbosity=1)training XGBoost classifier with best parameters
xgb_classifier_pt = XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=0.4, gamma=0.2,
learning_rate=0.1, max_delta_step=0, max_depth=15,
min_child_weight=1, missing=None, n_estimators=100, n_jobs=1,
nthread=None, objective='binary:logistic', random_state=0,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
silent=None, subsample=1, verbosity=1)
xgb_classifier_pt.fit(X_train, y_train)
y_pred_xgb_pt = xgb_classifier_pt.predict(X_test)
Accuracy after model
accuracy_score(y_test, y_pred_xgb_pt)output - 0.9824561403508771
Grid search
Applying grid search on the model which works on whole data.
Training the model
from sklearn.model_selection import GridSearchCV
grid_search = GridSearchCV(xgb_classifier, param_grid=params, scoring= 'roc_auc', n_jobs= -1, verbose= 3)
grid_search.fit(X_train, y_train)
Now comes the implementing it
xgb_classifier_pt_gs = XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=0.3, gamma=0.0,
learning_rate=0.3, max_delta_step=0, max_depth=3,
min_child_weight=1, missing=None, n_estimators=100, n_jobs=1,
nthread=None, objective='binary:logistic', random_state=0,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
silent=None, subsample=1, verbosity=1)
xgb_classifier_pt_gs.fit(X_train, y_train)
y_pred_xgb_pt_gs = xgb_classifier_pt_gs.predict(X_test)
accuracy_score(y_test, y_pred_xgb_pt_gs)output 0.9824561403508771
As,we are getting the nearly same accuracy after applying these tuning method so we will use grid search in this know comes the part of classification report and types of error.
Confusion matrix
It gives the value of true positive and false negative which will help to predict how much our model is optimized to predict it.
from sklearn.metrics import confusion_matrix, classification_report
cm = confusion_matrix(y_test, y_pred_xgb_pt)
plt.title('Heatmap of Confusion Matrix', fontsize = 15)
sns.heatmap(cm, annot = True)
plt.show()
The model is giving 0 type II error and it is best and for model is gving 2/112 near 0.017 error. it means we have very less chance for the wrong prediction around zero.
Classification report of the model
print(classification_report(y_test, y_pred_xgb_pt))
Output
precision recall f1-score support
0.0 1.00 0.96 0.98 48
1.0 0.97 1.00 0.99 66
micro avg 0.98 0.98 0.98 114
macro avg 0.99 0.98 0.98 114
weighted avg 0.98 0.98 0.98 114
Cross-validation of the ML modelCross validation
from sklearn.model_selection import cross_val_score
cross_validation = cross_val_score(estimator = xgb_classifier_pt, X = X_train_sc,y = y_train, cv = 10)
print("Cross validation accuracy of XGBoost model = ", cross_validation)
print("\nCross validation mean accuracy of XGBoost model = ", cross_validation.mean())
Output
Cross validation accuracy of XGBoost model = [0.9787234 0.97826087 0.97826087 0.97826087 0.93333333 0.91111111- 1. 0.97777778 0.88888889] Cross validation mean accuracy of XGBoost model = 0.9624617124062083 Saving model for deployment pickle.dump(xgb_classifier_pt, open('breast_cancer_detector.pickle', 'wb')) # load model breast_cancer_detector_model = pickle.load(open('breast_cancer_detector.pickle', 'rb')) # predict the output y_pred = breast_cancer_detector_model.predict(X_test) # confusion matrix print('Confusion matrix of XGBoost model: \n',confusion_matrix(y_test, y_pred),'\n') # show the accuracy print('Accuracy of XGBoost model = ',accuracy_score(y_test, y_pred)) ############################ #Output Confusion matrix of XGBoost model: [[46 2] [ 0 66]] Accuracy of XGBoost model = 0.9824561403508771 Now are model is dumb into pickle file.now its time for the flask for the model to deploy. Now we have to switch towards sublime text editor for the deployment.the main aim is to used html,css with flask in it. The code here depicts the loading of dumb model and then we are accessing the index.html file for the home page which we can discuss further. then we have the predict function which will be implemented when we will enter the input 30 constraints as an input in the box and then array of size 30 goes to the data frame for the prediction and return will make the after.html file which tell about the output as an tumor malignant and benign according to the value input by user and new webpage open with this classification. The code tells about the title of the webpage with background of image 144.jpg and then in heading 3 lines as in the center of the webpage with various different size .Then , we make a placeholder which will help us to store the input value and display placeholder name on webpage. Then,we will make the "Click here to predict "button for the prediction. and then header for the name. the code next to it depicts about the displaying the icons of the social media with the font size and the specific color. Each specific icon is hyperlinked to my social media handle and then end of the body. Now comes the file which will come and depicts output of the inbuild functions. The after.html file have background to the webapge as image background and then it will have the PREDICTION will be 0 or 1 according to the data entered and image. Now our model is ready to run command first change directory to the folder where we have the run python app.py Then ,it will be return on local system http://127.0.0.1:8500/ and it will be going to predict after the http://127.0.0.1:8500/predict . We will deploy this on heroku through heroku CLI. Register on heroku. Then create new app icon and then write name of app and choose a region. Download to the system Heroku CLI and then open cmd. Execute these commands on cmd with folder directory. cd my-project/ git init heroku git:remote -a my-project git add . git commit -am "make it better" git push heroku master
Then go to settings your webpage is created.
If you want to implement code from your hand then click the button and implement code end to end with explanation in brief.
If u like to read this article and have common interest in similar projects then we can grow our network and can work for more real time projects.
For more details connect with me on my Linkedin account!
THANKS!!!!
Posted on August 9, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 17, 2024