Design and Models for a Personalized Product Recommendations Engine

Matt Scruggs, Senior Software Engineer

Matt Scruggs, Senior Software Engineer

Earlier this year, we introduced Recommendations Premium, an app that automates personalized product recommendations for busy marketers and boosts their one-to-one messaging capabilities. Since I began working here, I’ve spent much of my time researching and developing the models that allow Recommendations Premium to deliver personalized results for a variety of customers and contexts, seamlessly.

In this article, I’ll discuss how personalized recommendations fit in with our existing product service infrastructure by leveraging the powerful search features of Apache Solr. I’ll also briefly describe the specific models that underpin Recommendations Premium.

Using Apache Solr to Support Personalized Recommendations

Before discussing specific models, it’s important to know a bit about how we store and serve product data for the Bronto Marketing Platform. Our product service uses Solr to support efficient, scalable searches on products based on arbitrary product fields, and it stores data in Apache HBase as a master storage database.

The product service already uses Solr to allow customers to search for products (either in the Bronto Marketing Platform user interface or in the Recommendations Standard app), so it seemed like a strong choice for personalized recommendations.

Fortunately, a class of recommender systems exists that fit in quite elegantly with the search-based features of Solr: indicator-based recommenders. This type of recommender system detects which items are good indicators for an item, meaning they co-occur in an interesting way (not simply due to popularity).

The table below provides a mock example of what this dataset could look like for our Bought This, Bought That model:

Product Indicators in Solr

Product Bought This, Bought That Indicators
Pinstriped shirt Bronto cufflinks
Bronto cufflinks Pinstriped shirt, shovel
Laundry detergent Pinstriped shirt, wheelbarrow
Shovel Wheelbarrow

How does this Solr dataset help the Bronto Marketing Platform generate personalized recommendations? The platform stores individual customers’ activities and searches the appropriate indicators field in Solr for the products they’ve recently interacted with – by browsing or buying, for example. Solr returns the set of products that are indicated by your customer’s recent activity, and the results are ordered by relevance, which Solr determines using the term frequency-inverse document frequency (TF-IDF) scoring algorithm.

For example, imagine that customer Amy has bought a pinstripe shirt and a wheelbarrow. Therefore, Amy would receive laundry detergent and Bronto cufflinks in a personalized recommendation. Laundry detergent would rank higher because Amy has bought two indicators for that product, whereas she only has bought one indicator for the Bronto cufflinks.

The Four Models

We use Apache Spark to run a nightly suite of models that each create a set of product indicators, which are uploaded to Solr. Here are brief descriptions for each of these models, from which customers can choose when configuring a new recommendation in our Recommendations Premium app.

1. Bought This, Bought That and Browsed This, Browsed That

These two models are presented together because they use the same underlying algorithm, but with different input datasets (order data and browse data, respectively). The Apache Mahout project includes a scalable and efficient Item-Item Collaborative Filtering algorithm that runs on Spark. This algorithm compares purchase (or browse) patterns among all the products in the input dataset, and detects which items are bought or browsed together in an interesting way.

In particular, Mahout’s implementation uses the log-likelihood ratio to quantify and find interesting relationships among products. Each of these two models produces its own set of indicators for each product that are stored in Solr and can be searched separately.

2. Similar Content Products

Not every customer has a rich purchase or browse history with your website, and some products are too new to have indicators from the Bought This, Bought That and Browsed This, Browsed That models. In these situations – examples of the “cold start problem” – showing similar products can help fill in results automatically.

The Similar Content Products model uses the Word2Vec algorithm provided with Spark to convert each product into a high-dimensional feature vector based on select product fields: title, description, brand and category. The model then runs an approximate nearest-neighbors algorithm to efficiently find similar vectors for each product, and voila – these become content-based indicators!

3. Frequently Bought Together

This model attempts to find products that are – you guessed it – frequently bought together within the same order. It works by calculating the log-likelihood ratio (LLR) between every pair of ordered products, and computes basic central tendency and variation statistics to identify a threshold LLR score that determines interesting co-occurrence. The signed-root LLR score (a slight variation of the raw LLR score) is roughly normally distributed, making this kind of analysis relatively straightforward once the LLR scores are computed.

4. Best Product

This model builds on the results of the Bought This, Bought That model, the Browsed This, Browsed That model, and the Similar Content Products model. No additional offline analysis is performed; rather, a special query searches Solr for a customer’s product history in all three of the indicator sets produced by each of the underlying models. This allows Bronto to simultaneously search multiple sets of model results with varied weights to find the product that is most likely to pique a customer’s interests.

Recommendations Premium has been a fun and rewarding challenge. Its recommender system is built to serve highly relevant results to new and existing customers, seamlessly and efficiently. You can choose from a variety of models to automatically place the right content in front of your customers, and now you know a little more about what they are and how they work. Happy modeling!