Predicting customer churn in banking using ANN
Omkar Ajnadkar
Posted on August 17, 2018
Dataset
The dataset ‘Churn_Modelling.csv’ contains records of 10,000 customers of a bank with following columns:
- RowNumber
- CustomerId
- Surname
- CreditScore
- Geography
- Gender
- Age
- Tenure
- Balance
- NumOfProducts
- HasCrCard
- IsActiveMember
- EstimatedSalary
- Exited
By using the columns 1 to 13, we want to predict if the customer will exit or not that is column 14.
Data Preprocessing
- Removing unnecessary features
- Label Encoder
- One Hot Encoder
- Train Test Split
- Standard Scaler
Simple ANN Model using Keras
Create an ANN with total 4 layers
- One input layer with 11 input features and 6 output features
- Hidden layer with 6 output features
- Final layer with 1 output feature
Activation Functions
- 1st layer: Relu
- 2nd layer: Relu
- 3rd layer: Sigmoid
Hyperparameters
- optimizer: adam
- loss: binary_crossentropy
- metrics: accuracy
- batch_size: 10
- epochs:100
Accuracy(Subject to change)
- Training Set: 0.8610
- Testing Set:0.86
Improving ANN
- Use k-fold classifier to split training set in say 10 parts and applying training on 9 out of 10 parts and testing on another every time to decrease fluctuation in accuracy every time you run the code.
- Use Dropout technique with a certain threshold to decreases overfitting on the training set. Applying to this dataset gives an accuracy of 0.8321 which means now data is less overfitted to this training set.
- Use GridSearchCV to find best parameters automatically. Enter all the hyperparameters you want to test your network on and after testing everything it will give the best possible accuracy and parameters. I tried with the following parameters:
batch_size: 25, 32
epochs: 100, 500
optimizer: adam, rmsprop
- After waking in the morning(yes, it takes a long time…), this is what I found…
best_parameters
- batch_size: 25
- epochs: 500
- optimizer: rmsprop
accuracy: 0.8545
Further Improvements
You can further improve this model by changing hyperparameters and trying other range of values in GridSearchCV. But it is important to note that, as you will increase the number of parameters in GridSearchCV, your time for training will also increase.
Code
blackbird71SR / Small-Deep-Learning-Projects
Small projects with Deep Learning magic! - Predicting Customer Churn in Banking, Predict tags on Stack Overflow, Sign Language Recognition
Send a pull request for any suggestions and errors…
Posted on August 17, 2018
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.