Real-time Phishing Attack Detection using ML 💻

abdulghani200

Abdul Ghani

Posted on May 21, 2020

Real-time Phishing Attack Detection using ML 💻

My Final Project

So, I've built this project called RPAD-ML in my final year. It is essentially an Android app coupled with a machine learning backend server which detects 🕵️ any link that is a possible phishing site in REALTIME ⚡. It can detect malicious/phishing links from any app. Open any app which has external links 🔗, RPAD-ML will detect it in no time and gives you a warning message⚠️ right away.

Demo

Download RPAD-ML Demo APK

I know there are lots of things available like Google safe browsing. But those are limited to chrome web browser. So, What I've done is used a machine learning model of phishing sites combined with Google safe browsing which when given a URL predicts whether it is a phishing website or not.

Link to Code

GitHub logo abdulghanitech / rpad-ml

Real-time Phishing Attack Detection using ML 💻

rpad-ml

Real-time Phishing Attack Detection using ML 💻

The repo contains code for both the ML server and the Android app which was used to detect phishing sites in real-time. Below is a flow chart of it.

Screenshot




How I built it

I've got a machine learning model built using dataset of phishing sites.

DATA SELECTION

The dataset is downloaded from UCI machine learning repository. The dataset contains 31 columns, with 30 features and 1 target. The dataset has 2456 observations.

MODELS

To fit the models over the dataset the dataset is split into training and testing sets. The split ratio is 75-25. Where in 75% accounts to training set.

Now the training set is used to train the classifier. The classifiers chosen are:

* Logistic Regression

* Random Forest Classification

* Support Vector Machine

We will see which one fits best in our dataset.

1.Logistic Regression

Fitting logistic regression and creating confusion matrix of predicted values and real values I was able to get 92.3 accuracy. Which was good for a logistic regression model.

2.Support Vector Machine

Support vector machine with a rbf kernel and using gridsearchcv to predict best parameters for svm was a really good choice, and fitting the model with predicted best parameters I was able to get 96.47 accuracy which is pretty good.

3.Random Forest Classification

Next model I wanted to try was random forest and I will also get features importances using it, again using gridsearchcv to get best parameters and fitting best parameters to it I got very good accuracy 97.26.

Random forest was giving very good accuracy. We can also try artificial neural network to get a improved accuracy.

FEATURE IMPORTANCES

FEATURE IMPORTANCE
ML Model: Phishcoop

Hosting online as a server

I've used the Heroku platform (Hobby plan provided by GitHub education) to host this machine learning model online. I used pickle to save and load the machine learning model and hosted it using Flask.

The idea was to put this as a service and then call it from the android app.

Android App

Essentially, this is the front-end to call this service. I've used Android's accessibility API to access and intercept network. Hence, I got the URLs being opened in any app using this method.

Now, after getting this url, firstly I call the Google safe browsing API to check whether it is a phishing site or not. If yes, I show a warning dialog else I call the machine learning backend server and using the result provided by it I again show warning dialog if the result comes as phishing site.

Additional Thoughts / Feelings / Stories

This was more like a prototype. While it is not that perfect, but hey it works 🙌🏻. And the best thing is I've learnt so much by working on this project 🤓

💖 💪 🙅 🚩
abdulghani200
Abdul Ghani

Posted on May 21, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

My first React App: Nüte
devgrad2020 My first React App: Nüte

May 24, 2020

GamesBoat
devgrad2020 GamesBoat

May 20, 2020

GetOnTrack
devgrad2020 GetOnTrack

May 20, 2020