Predicting Customer Behaviour to Starbucks Promotional Offers

Bilgin Koçak
10 min readMay 5, 2021
Photo by TR on Unsplash

This post is about my Udacity Data Science Nanodegree capstone project.

Project Definition

Project Overview

Starbucks is also expanding by venturing into many more cities locally and countries globally. With the increase in the range of products and the also the increasing demographics, the objective here is to find the customers’ patterns and behaviors.

As the world’s largest coffeehouse chain, Starbucks is seen to be the main representation of the United States’ third wave of coffee culture. As of September 2020, the company had 32,660 stores in 83 countries, including 16,637 company operated stores and 16,023 licensed stores. Of these 32,660 stores, 18,354 were in the United States, Canada, and Latin America. Therefore it is necessary to predict the customer behaviour depending on various factors as to promotional offer.

Problem Statement

The machine learning project below originates from this approach, as it is based on data from a simulated customer test provided by Starbucks in cooperation with Udacity.

We will be exploring the Starbuck’s Dataset which simulates how people make purchasing decisions and how those decisions are influenced by promotional offers.There are three offers_types that can be sent: buy-one-get-one (BOGO), discount, and informational.We will create a model that can predict behaviour of customer. This model can be used for a new customer in order to influence the customer by promotional offer.

Metrics

  • It is a simple classification problem therefore, we will use accuracy to evaluate models.
  • Compare the correct predictions and total number of predicitons to determine the accuracy of the model and choose the best.

We will answer the question which is, can we predict best promotional offer for the given customer.

In order to compare other machine learning models we will use lazypredict library. Lazy Predict helps build a lot of basic models without much code and helps understand which models works better without any parameter tuning. Documentation of lazypredict library are here.

Data Understanding

The data is provided by Starbucks contains simulated data that mimics customer behavior contains three data files. Ten different types of offers were sent to the customers and were received by the customers via four different channels: web, email, mobile, and social. These included three types of offers: buy-one-get-one (BOGO), discount, and informational. The data contains information about 17,000 customers receiving offers, opening offers, completing offers, and making transactions.

The data consists of three JSON files:

  1. portfolio.json — metadata about each offer (duration, type, etc.)
  2. profile.json — demographic data for each customer
  3. transcript.json — records for transactions, offers received, offers viewed, and offers complete

Here is the schema and explanation of each variable in the files:

portfolio.json- 10 rows, 6 columns

  • id (string) — offer id
  • offer_type (string) — type of offer ie BOGO, discount, informational
  • difficulty (int) — minimum required spend to complete an offer
  • reward (int) — reward given for completing an offer
  • duration (int) — time for offer to be open, in days
  • channels (list of strings)

profile.json- 17000 rows, 5 columns.

  • age (int) — age of the customer
  • became_member_on (int) — date when customer created an app account
  • gender (str) — gender of the customer (note some entries contain ‘O’ for other rather than M or F)
  • id (str) — customer id
  • income (float) — customer’s income

transcript.json- 306534 rows, 4 columns.

  • event (str) — record description (ie transaction, offer received, offer viewed, etc.)
  • person (str) — customer id
  • time (int) — time in hours since start of test. The data begins at time t=0
  • value — (dict of strings) — either an offer id or transaction amount depending on the record

Data Cleaning

In this part we will handle missing values and also extreme values in dataset. We will process data.

Data Preprocessing

Profile Dataframe:

  • Drop row if null value exist.
  • Check the age column for extreme values (118)
  • Drop row if extreme value exist.
Fig1: profile dataframe after data cleaning

Portfolio Dataframe:

  • One-hot encode channels
  • One-hot encode offer_type column
Fig2: portfolio dataframe after data cleaning

Transcript Dataframe:

  • Create separate columns for amount, reward and offer_id from value column dictionary
  • Merge the three datasets with common columns
  • transcript: segregate offer and transaction data
  • Label the columns — offer_id, offer_type, gender and customer_id
  • Create a offers dataframe by seperating transaction in the event column
Fig3: transcript dataframe after data cleaning
Fig4: offer dataframe after data cleaning

Data Exploration and Visualization

In this part we will answer the following questions

  1. What is the Gender, Age and Income Distribution of Starbucks Customers?
  2. How many customers enrolled yearly?
  3. What is the average age of Starbucks Customers?
  4. What is the average Income of Starbucks Customers?
  5. Which gender has the highest yearly membership?
  6. Which gender has the highest annual income?
  7. What is the distribution of event in transcripts?
  8. What is the percent of trasactions and offers in the event?
  9. What are the types of offers : received, views, completed?
  10. What is the Income Distribution for the Offer Events?
  11. What is the highest completed offer?
  12. What is the lowest completed offer?

Q1: What is the Gender, Age and Income Distribution of Starbucks Customers?

Q2: How many customers enrolled yearly?

Fig5: Demographics of Customer Data of Starbucks

Q3: What is the average age of Starbucks Customers?

The average age of starbucks customers: 54.39

Q4: What is the average Income of Starbucks Customers?

The average income of starbucks customers: 65404.99

Q5: Which gender has the highest yearly membership?

Fig6: Gender distribution of yearly membership

The male gender has highest yearly membership for all years except 2016.

Q6: Which gender has the highest annual income?

gender
F 71306.41
M 61194.60
O 63287.74
Name: income, dtype: float64
Female has highest annual income.

Q7: What is the distribution of event in transcripts?

Fig7: Number of events in Transcripts

Q8: What is the percent of trasactions and offers in the event?

The percentage of transaction in all events: 45.45
The percentage of offer in all events: 54.55

Q9: What are the types of offers : received, views, completed?

Fig8: Offer types
0 : bogo
1 : discount
2 : informational

Q10: What is the Income Distribution for the Offer Events?

Fig9: Income distribution for the offer events

Q11: What is the highest completed offer?

Number of Completion: 5003
offer_id with maximum offers completed:9
Original offer_id with maximum offers completed: ['fafdcd668e3743c1bb461111dcafc2a4']

Q12: What is the lowest completed offer?

Number of Completion: 3310
offer_id with maximum offers completed:4
Original offer_id with maximum offers completed: ['4d5c57ea9a6940dd891ad53e9dbe8da0']

Data Modelling

Drop ‘income_groups’, ’amount’, ’became_member_on’ , ’event’, ’reward_x’ , ’customer_id’, and ‘offer_id’ columns from offer dataframe.

Supervised Learning (Classification)

The target column is offer_type. It will help to predict the correct offer_type to send to each customer.

Train Data Size: 119044
Test Data Size: 29761

Model Selection

Lazypredict will train different classification model and it will give their accuracy values. Accuracy for test data are shown below. Nearly all of them give 100% accuracy. Therefore we can select any one of model. We selected AdaBoostClassifier.

                              Accuracy  Balanced Accuracy F1 Score  
Model
AdaBoostClassifier 1.00 1.00 1.00
BaggingClassifier 1.00 1.00 1.00
XGBClassifier 1.00 1.00 1.00
SVC 1.00 1.00 1.00
SGDClassifier 1.00 1.00 1.00
RidgeClassifierCV 1.00 1.00 1.00
RidgeClassifier 1.00 1.00 1.00
RandomForestClassifier 1.00 1.00 1.00
QuadraticDiscriminantAnalysis 1.00 1.00 1.00
Perceptron 1.00 1.00 1.00
PassiveAggressiveClassifier 1.00 1.00 1.00
NuSVC 1.00 1.00 1.00
NearestCentroid 1.00 1.00 1.00
LogisticRegression 1.00 1.00 1.00
LinearSVC 1.00 1.00 1.00
LinearDiscriminantAnalysis 1.00 1.00 1.00
KNeighborsClassifier 1.00 1.00 1.00
GaussianNB 1.00 1.00 1.00
ExtraTreesClassifier 1.00 1.00 1.00
ExtraTreeClassifier 1.00 1.00 1.00
DecisionTreeClassifier 1.00 1.00 1.00
CalibratedClassifierCV 1.00 1.00 1.00
BernoulliNB 1.00 1.00 1.00
LGBMClassifier 1.00 1.00 1.00
CheckingClassifier 0.43 0.33 0.26
DummyClassifier 0.38 0.33 0.38

Evalute and Validate The Model Accuracy

Let’s evalaute AdaBoostClassifier model. First we will train the model. We obtain 100% accuracy.

An AdaBoost classifier is a meta-estimator that begins by fitting a classifier on the original dataset and then fits additional copies of the classifier on the same dataset but where the weights of incorrectly classified instances are adjusted such that subsequent classifiers focus more on difficult cases.

n_estimators are selected as 100. Other model parameters are selected as default value.

Model Justification

The model has correctly predicted that the customer will likely respond discount offer type with an accuracy of 100 %. Hence our model has good accuracy for prediction. The model is ready to deploy.

Fine Tuning the Model

Since we obtain 100% accuracy in test set. We do not need to fine tune the model that we select. In order words, we don’t need to optimize the hyperparameters in the model.

Let’s create feature importance plot. To plot, we need to train another model which is DecisionTreeClassifier. After training we obtain 100% accuracy. According to decistion tree model bogo column and difficulty column are enough to predict target column which is promotional offer.

Decision Tree Model Accuracy : 1.0
Feature: 0, time, Score: 0.0
Feature: 1, gender, Score: 0.0
Feature: 2, age, Score: 0.0
Feature: 3, income, Score: 0.0
Feature: 4, start_year, Score: 0.0
Feature: 5, reward_y, Score: 0.0
Feature: 6, difficulty, Score: 0.36169
Feature: 7, duration, Score: 0.0
Feature: 8, email, Score: 0.0
Feature: 9, mobile, Score: 0.0
Feature: 10, social, Score: 0.0
Feature: 11, web, Score: 0.0
Feature: 12, bogo, Score: 0.63831
Feature: 13, discount, Score: 0.0
Feature: 14, informational, Score: 0.0

This is another feature importance plot which is obtained from RandomForestClassifier. For this classifier, discount, bogo and reward_y column have more importance than others.

Random Forest Model Accuracy : 1.0
Feature: 0, time, Score: 8.9408e-05
Feature: 1, gender, Score: 3.6282e-06
Feature: 2, age, Score: 1.9731e-06
Feature: 3, income, Score: 8.3178e-06
Feature: 4, start_year, Score: 4.7092e-06
Feature: 5, reward_y, Score: 0.20442
Feature: 6, difficulty, Score: 0.073274
Feature: 7, duration, Score: 0.094554
Feature: 8, email, Score: 0.0
Feature: 9, mobile, Score: 0.025366
Feature: 10, social, Score: 0.0029461
Feature: 11, web, Score: 0.023813
Feature: 12, bogo, Score: 0.2019
Feature: 13, discount, Score: 0.28508
Feature: 14, informational, Score: 0.088532

Conclusion

Key Insigts:

  • The count of male customers in low-income level is slightly higher than that of female.
  • The average age of starbucks customers is 54.39.
  • The average income of starbucks customers is 65404.99
  • The average salary of female is greater than male average salary, female spend less on starbucks than male.
  • Starbucks has more of the young people than those of the aged once.
  • The result of the offer_type was prediced succesfully by training a classifier.
  • Starbucks have more male customers than females and other gender.
  • According to Decision Tree Classifier, bogo and difficulty columns have more importance. According to Random Forest Classifier, discount, bogo and reward_y columns have more importance.

Customers are attracted to BOGO and Discount offers more as compared to Informational Offers. The buying behaviour of a customer are independent of its annual income.

The post highlights the importance of data understanding and wrangling in this process. The project was successfully completed by training a supervised classifier using AdaBoostClassifier. Also most of the classification model give best results for the given data. Moreover, since we obtain 100% accuracy in test set. We do not need to fine tune the model that we select. In order words, we don’t need to optimize the hyperparameters in the model.

We obtain 100% accuracy for most of the algorithm since data are suitable to classification. In other words, it is a simulated data.

Improvement:

Theoratically model can not be improved. Since it gives 100% accuracy in test set.

By using more data or real data (not simulated data) we can select the best classification algorithm. In other words, we can find best classification model for the problem. Since all of them give 100% accuracy, we can select one of the machine learning model. For the given data, below classifier model give best result. This is a perfect model because it gives 100% accuracy in testing data.

AdaBoostClassifier
BaggingClassifier
XGBClassifier
SVC
SGDClassifier
RidgeClassifierCV
RidgeClassifier
RandomForestClassifier
QuadraticDiscriminantAnalysis
Perceptron
PassiveAggressiveClassifier
NuSVC
NearestCentroid
LogisticRegression
LinearSVC
LinearDiscriminantAnalysis
KNeighborsClassifier
GaussianNB
ExtraTreesClassifier
ExtraTreeClassifier
DecisionTreeClassifier
CalibratedClassifierCV
BernoulliNB
LGBMClassifier

To see more about this analysis, see the link to my Github available here.

Thanks for reading my article! I’d really appreciate any feedback on my writing (criticisms, where I can improve on, etc.) On that note, feel free to comment to ask questions or approach me directly on LinkedIn. Also, you can visit my website.

--

--