Correct Sampling Bias for Recommender Systems | by Thao Vu | Oct, 2023

What is sampling bias in recommendation, and how to correct them

Thao Vu
Towards Data Science
by NordWood Themes on Unsplash

are ubiquitous in our digital lives, ranging from e-commerce giants to services. However, hidden beneath every large recommender system lies a challenge that can significantly impact their effectiveness — sampling bias.

In this article, I will introduce how sampling bias occurs during training recommendation models and how we can solve this issue in practice.

Let’s dive in!

In general, we can formulate the recommendation problem as follows: given query x (which can contain user information, context, previously clicked items, etc.), find the set of items {y1,.., yk} that the user will likely be interested in.

One of the main for large-scale recommender is low- requirements. However, user and item pools are vast and , so scoring every candidate and greedily finding the best one is impossible. Therefore, to meet the latency requirement, recommender systems are generally broken down into 2 main stages: retrieval and ranking.

Multi-stage recommender systems ( by the author)

Retrieval is a cheap and efficient way to quickly capture the top item candidates (a few hundred) from the vast candidate pool (millions or billions). Retrieval optimization is mainly about 2 objectives:

  • During the training phase, we want to encode users and items into embeddings that capture the user’s behaviour and preferences.
  • During the inference, we want to quickly retrieve relevant items through Approximate Nearest Neighbors (ANN).

For the first objective, one of the common approaches is the two-tower neural networks. The model gained its for tackling the cold-start problems by incorporating item features.

In detail, queries and items are encoded by corresponding DNN towers so that the relevant (query, item) embeddings stay…

Source link