
Rating: 6.4/10.
Book about preparing for machine learning interviews, covers ~10 different problems and goes through the typical machine learning system design interview process: model architecture, data collection, training, evaluation metrics, and deployment. This book has several major weaknesses: the most obvious is the overwhelming focus on search and recommendation systems, which means most of the book is extremely repetitive. As 8/10 chapters cover some kind of recommendation system, and the same approaches are used with almost no differences whether you’re recommending videos, ads, or newsfeed items – you extract features, do feature engineering, concatenate them into a large vector, and then apply either classification, pointwise regression, or contrastive learning.
While there is some variation in the specific models used, the model choices for similar problems are arbitrary. There is no discussion about why different types of problems are solved using binary classification vs pointwise ranking vs nearest neighbors embeddings. When it comes to more specific models like object detection, factorization machines, or GNNs, the book avoids going into detail and declares these topics as out of scope. This book is disappointing because there are so many other data science / ML topics that could’ve been covered beyond recommendation systems, and overall this book is much worse than the system design interview book from the same series.
Chapter 1. How to approach a ML design question: you need to first understand the business problem, translate it to an ML problem, and define the inputs and outputs, determining whether it’s supervised or unsupervised, regression or classification. Feature engineering involves dealing with missing and categorical values, standardization, etc. Discuss how to gather the data and label them either automatically or with humans, dealing with class imbalance, and defining the evaluation metrics. Then for deploying, how to monitor accuracy with real traffic, distribution drift, and system metrics like latency.
Chapter 2: Visual search where the user uploads an image and you want to return visually similar images, recommend using an embedding generation model trained using contrastive learning. The evaluation metrics are retrieval-based ones like precision, recall, mAP, and DCG. During retrieval, use a nearest neighborhood database for embeddings, which we can retrieve either exactly or approximately
Chapter 3: Blurring license plates and faces in streetview photographs. They recommend treating it as an object detection problem using either one-step or two-stage neural networks. The evaluation metrics include IOU and retrieval metrics such as mAP based on IOU thresholds. Once objects are detected, they can be passed to a blurring service.
Chapter 4: Search system for YouTube videos. They recommend using bag of words and TF-IDF to generate embeddings for search queries, while also generating embeddings from the video encoder and training them using contrastive learning in a re-ranking system.
Chapter 5: Harmful content detection on social media sites. They recommend multiclass classification for different types of abuse and constructing concatenated embeddings from text fields, image and video features, and author features, combining them into one big classifier, then evaluate using binary classification metrics.
Chapter 6: YouTube video recommendation. They propose a hybrid approach combining content and collaborative filtering with video features and user feature vectors through a two-tower network to bring user and video embeddings closer together when they are a good match (more flexible than matrix factorization which can only take user and video co-occurrence data); searching is a nearest neighbor query.
Chapter 7: Event ranking system. They propose pointwise ranking from event and user features into a model that predicts relevance, extract features such as location, time of day, time of week, event description, etc., and compare these to the users’ previously visited events to train a regression model for ranking.
Chapter 8. Ad click prediction, use binary classification models on concatenated features from the user, the ad, and their interactions. Some models like factorization machines can learn interaction effects automatically.
Chapter 9: Recommending similar vacation listings in a rental site. They propose a model that takes a sequence of previously visited listings and predicts whether a future listing will be booked or not.
Chapter 10: Newsfeed recommendations for social media. The model concatenates user and post features, then uses multitask classification to predict different user engagement actions.
Chapter 11: Recommending people you may know on a social media site. Frame this as an edge prediction problem in a graph, determining whether two people will become friends, and then training a GNN. The prediction may be computationally expensive, so it’s best to process in batches and store the predictions in a database for retrieval.