24
Guide to model training: Part 5 - Reliable Remarketing
Got data? Build your own AI model for remarketing with machine learning!
- Recap
- Before we begin
- What is remarketing?
- Predicting success
- Model training
- Results
We finished all the processing of our data, so now we’ve made it to the final step of the model training series and will be training data. Based on our needs, we’ll continue with the remarketing campaign to get ready for the holiday rush! Tons of our customers will be on the lookout for holiday gifts and relevant sales, so now it’s up to us to connect them with their wishlist.
For those following along the guide, we’re going to take the resulting data from when we scaled the numbers, categories, and imputed data. Then, we’ll be doing a merge on the data. Read our data preparation guide if you’re not familiar with joining and merging datasets.
A lot of developers, myself included, weren’t familiar with the term remarketing. In fact, it’s often confused with or synonymously used with retargeting which is not remarketing. The core difference here is the audience. Retargeting looks for a new audience, while remarketing focuses on the old, or existing users. To define remarketing in the context of this guide, we’ll be looking at data of retired users of our product. By using the data to understand the previous users, we’ll focus our marketing campaign efforts to select individuals who will most likely come back, given a certain promo that meets their needs.
The most common form of day to day remarketing you observe but rarely think about are push notifications. Everyone has a phone and you’re often reminded to open an app you haven’t opened in a while. There are also cases where apps will entice you to come back with promotions like free gifts or credits for joining.
In this guide, since we have a list of customer emails and email_ids, we’ll be focusing on the more traditional form of remarketing. Email campaigning is essentially the same idea, but with different mediums. We’ll be focusing on using remarketing via email to increase customer activity.
The holidays are coming up, and the campaign is almost here. Everyone will be wanting to purchase luxury items such as wine or chocolates as gifts! We’ll be using machine learning to create more interaction and engagement with customers, to boost already active customers and aim to gain back inactive customers that have taken a hiatus.
To begin, we’ll start by combining finalized information from prior parts and putting them together. For more details on how to join data, read our guide on combining data. To simplify, I’ve prepared all the datasets from our previous parts into remarketing_campaign.csv, which originate from these files, categorical_campaign.csv, numerical_campaign.csv, encoder_campaign.csv, impute_campaign.csv, and datetime_campaign.csv. This contains everything from what a customer’s household is like, purchase history, and interactions with prior campaigns.
We’ll be offering sales on party items that make great gifts, such as wine and sweets for the holidays. Our campaign is a rerun of last year’s holiday sales. With an understanding of what’s in the campaign, we need to think about which portions of data influence a customer to action on an email campaign offering a sale on gifts. During the holidays, the sale of luxuries is hot, looking to sell wine and sweets. This is MntWine and MntSweet in our dataset.
With the data prepared, we can begin slicing them into portions for training and testing. These sets will be used to train data using itself and determine the metrics for model evaluation in the next part.
To start, we’ll identify the data we’ll want to use as our feature vector, aka input. Then we’ll determine what to look for as a result, label feature, or output. Here, we’ll use everything in our dataset, to try to predict whether they will act on another notification from us. This training set will contain the data, and confirm that the predictions will be correct.
The test set is data that will be tested with the output from training to determine how good a model is. For our model we’ll be using the Logistic Regression function from scikit-learn.
When training a classification model, we take note of the metrics at the end — accuracy, precision, recall. To learn more about what these metrics mean, check out our guide to accuracy, precision and recall. In our next series, we’ll explore what these and more metrics are, how they’re calculated, and what their values tell us about the dataset.
24