Predicting Customer Churn with Artificial Neural Networks (ANN)

In this article, we’ll explore how ANN can be used to forecast customer churn of a bank for credit card in a simple and understandable way. We we’ll also see the basic implementation oof keras library with tensorflow.

Table of Contents

Overview

Customer churn, or the loss of customers, can significantly impact a business’s bottom line. However, with the power of artificial neural networks (ANN), businesses can predict and potentially prevent customer churn before it happens. Firstly, let’s understand what customer churn is. Customer churn occurs when customers stop using a company’s products or services or leave the organization. This can happen for various reasons, such as dissatisfaction with the product, better offers from competitors, or changes in the customer’s needs or circumstances. Predicting churn involves analyzing historical data to identify patterns and indicators that signal when a customer is likely to churn.

Here our goal is to understand the working of basic neural networks. Artificial neural networks are a type of machine learning model inspired by the structure and function of the human brain. They consist of interconnected nodes, or neurons, organized into layers. In the context of customer churn prediction, ANN can be trained on historical customer data. You can access the dataset from here.

Here’s how the process of customer churn prediction using ANN typically works:

Data Collection
Data Preprocessing
Model Training
Model Evaluation
Prediction and Action

Data Collection

import pandas as pd

df = pd.read_csv('Churn_Modelling.csv')

Show preview of the data

df.head()

See the number of rows and columns using shape. We have 10000 records and 14 columns.

df.shape

Check the data types and null values.

df.info()

Check for duplicates. Luckily we do not have any duplicate rows.

df.duplicated().sum()

Drop the unnecessary data columns which do not help in analysis.

df.drop(columns=['RowNumber', 'CustomerId', 'Surname'], inplace=True)

Change the categorical data into numerical data for columns such as ‘Geography‘, ‘Gender‘.

df = pd.get_dummies(df, columns=['Geography', 'Gender'], drop_first=True)

Now lets scale the data, because some values are very big and some are very small.

X = df.drop(columns=['Exited'])
y = df['Exited']

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

Model Training

Now lets use keras from tensorflow to create neural network model.

import tensorflow
from tensorflow import keras
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense

While model building there are two types of model sequential and non-sequential. We’ll perform sequential modeling here.

model = Sequential()

# input with 1 hidden layer and 11 nodes
model.add(Dense(11, activation='relu', input_dim = 11))
# second hidden layer with 11 nodes
model.add(Dense(11, activation='relu'))
# output layer
model.add(Dense(1, activation='sigmoid'))

In above code we have created the neural network with one input later, one output layer and 2 hidden layers with 11 nodes each (perceptrons). We have also used the ‘relu‘ as activation function instead of ‘sigmoid‘ and input dimensions(columns) are also 11. Here we kept the sigmoid activation function for output layer.

Lets see the summary of model.

model.summary()

As we can see above, there are total 276 trainable parameters are created with different weights and biases.

Now lets compile the model and train it using fit() method.

model.compile(loss = 'binary_crossentropy', optimizer='Adam', metrics=['accuracy'])

history = model.fit(X_train_scaled, y_train, epochs=100, validation_split = 0.2)

As we can see, we have trained our model and stored it into a dictionary called history. We are taking 100 epochs, so that it will run 100 times.

Now our model has been trained lets check the values for weights and biases for each layers. Example for 0th layer the weights and biases are:

model.layers[0].get_weights()

Model Evaluation

Now we can do prediction using our model and find accuracy score but we also need to get our predicted result into the 0 and 1 category values.

y_log = model.predict(X_test_scaled)

y_pred = np.where(y_log > 0.5, 1, 0)

from sklearn.metrics import accuracy_score
accuracy_score(y_test, y_pred)

The accuracy score is 0.857 which we can improve further by changing or tuning some parameters values.

Lets plot some graphs for neural networks understanding and see if there are any overfittings or not.

import matplotlib.pyplot as plt

# How training error has reduced between 0 to 100 epochs.
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])

On above graph we can clearly see that there is a little difference between loss and validation loss functions, means a little overfitting issue is there. We can also check for accuracy the sale way.

plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])

By leveraging the power of artificial neural networks, businesses can gain valuable insights into customer behavior and identify potential churn risks early on. This allows them to take proactive steps to retain customers and foster long-term loyalty, ultimately driving growth and success in the competitive marketplace.