system design interview question

How to Design a Social Media Feed System for a System Design Interview

Introduction

A social media feed system is a classic system design interview question that tests your ability to handle massive scale, low latency, and real-time data delivery. The goal is to design a system that displays posts from users you follow, ordered by relevance or time. This guide breaks down the key components, strategies, and trade-offs you need to know to successfully design a social media feed system.


Requirements & Core Concepts

When designing a social media feed system, you must consider both functional and non-functional requirements. A clear understanding of these is the first step in a successful system design interview.

Functional Requirements:

  • Users can follow and unfollow others.
  • Users can post messages (text, images, etc.).
  • Users see a feed of posts from the people they follow.
  • The feed should be sorted by a key metric, such as a timestamp (latest first).

Non-Functional Requirements:

  • Scalability: The system must handle millions of users and billions of posts.
  • Low Latency: Feed retrieval should be fast, ideally within a few hundred milliseconds.
  • High Availability: The service must be available even if some components fail.

Data Model

A robust data model is the foundation of any system design. For a social media feed system, we need to model users, posts, and the follow relationship.

EntityFields
Useruser_id (Primary Key), name, email, profile_picture
Postpost_id (Primary Key), user_id (Foreign Key), content, timestamp, media_url
Followfollower_id (Primary Key), followee_id (Primary Key)

In a real-world system, these would likely be stored in a distributed database like Cassandra or DynamoDB, which are optimized for high-volume reads and writes.


Feed Generation Strategies

The choice of feed generation strategy is the most critical part of this system design problem. The two main approaches are fan-out on write and fan-out on read.

  • Fan-out on Write: When a user posts, the post is immediately pushed to a dedicated feed for each of their followers.
    • Pros: Feed retrieval is extremely fast (just a single read).
    • Cons: Not scalable for users with a very high follower count (e.g., celebrities), as a single write can trigger millions of operations. It can be resource-intensive.
  • Fan-out on Read: When a user requests their feed, the system fetches the latest posts from all the users they follow, merges them, and sorts them before displaying.
    • Pros: Handles users with many followers efficiently, as no extra work is done on post creation.
    • Cons: Feed retrieval can be slow, especially if a user follows many people, requiring many database reads and a sorting operation.

Most large-scale systems use a hybrid approach, combining both strategies to handle different types of users.


A Simplified Code Example (Python/Flask)

To illustrate the fan-out on read approach for a social media feed system, here is a simplified in-memory example using Python with the Flask framework.

Step 1: Setup the application and data stores

JavaScript
from flask import Flask, request, jsonify
from collections import defaultdict
import time

app = Flask(__name__)

# In-memory data stores
users = set()
posts = defaultdict(list)  # user_id -> list of (timestamp, post)
follows = defaultdict(set)  # follower_id -> set of followee_ids

Step 2: Create a user and handle follow/unfollow

JavaScript
@app.route('/user', methods=['POST'])
def create_user():
    user_id = request.json.get('user_id')
    if not user_id or user_id in users:
        return jsonify({'error': 'Invalid or existing user_id'}), 400
    users.add(user_id)
    return jsonify({'message': f'User {user_id} created'}), 201

@app.route('/follow', methods=['POST'])
def follow_user():
    follower = request.json.get('follower')
    followee = request.json.get('followee')
    if follower not in users or followee not in users:
        return jsonify({'error': 'Invalid users'}), 400
    follows[follower].add(followee)
    return jsonify({'message': f'{follower} followed {followee}'}), 200

@app.route('/unfollow', methods=['POST'])
def unfollow_user():
    follower = request.json.get('follower')
    followee = request.json.get('followee')
    if follower not in users or followee not in users:
        return jsonify({'error': 'Invalid users'}), 400
    follows[follower].discard(followee)
    return jsonify({'message': f'{follower} unfollowed {followee}'}), 200

Step 3: Create a post

JavaScript
@app.route('/post', methods=['POST'])
def create_post():
    user_id = request.json.get('user_id')
    content = request.json.get('content')
    if user_id not in users or not content:
        return jsonify({'error': 'Invalid user or content'}), 400

    timestamp = time.time()
    posts[user_id].append((timestamp, content))
    return jsonify({'message': 'Post created'}), 201

Step 4: Fetch the feed (Fan-out-on-read logic)

JavaScript
@app.route('/feed/<user_id>', methods=['GET'])
def get_feed(user_id):
    if user_id not in users:
        return jsonify({'error': 'User not found'}), 404

    followees = follows[user_id]
    feed_items = []

    # Fetch last 10 posts from each followee
    for f_id in followees:
        user_posts = posts[f_id][-10:]  # last 10 posts
        feed_items.extend([(f_id, ts, content) for ts, content in user_posts])

    # Sort all posts by timestamp descending
    feed_items.sort(key=lambda x: x[1], reverse=True)

    # Limit feed size
    feed_items = feed_items[:20]

    # Format feed
    feed = [{'user_id': u, 'timestamp': ts, 'content': c} for u, ts, c in feed_items]

    return jsonify({'feed': feed}), 200

Step 5: Run the app

JavaScript
if __name__ == '__main__':
    app.run(debug=True)

Usage Example

This example shows how the system would work.

  • Create users Alice and Bob.
  • Alice follows Bob.
  • Bob posts a message.
  • Alice fetches her feed and sees Bob’s post.

Limitations & Improvements for a Real-World System

The simplified example above has several limitations. In a real-world social media feed system, you would need to add the following to make it production-ready.

  • Persistent Data: Use a database like Cassandra or DynamoDB instead of in-memory data structures.
  • Caching: Implement a distributed cache like Redis to store and serve user feeds, significantly reducing database load and latency.
  • Hybrid Approach: Use a hybrid feed generation strategy to handle both regular users and “celebrity” accounts.
  • Asynchronous Processing: Use message queues (Kafka or RabbitMQ) to handle tasks like fan-out-on-write asynchronously.
  • Pagination: Implement pagination to fetch the feed in smaller, manageable chunks instead of all at once.
  • Ranking & Filtering: Introduce a ranking algorithm (e.g., based on engagement) to show more relevant posts at the top, and add filtering capabilities.
  • Security: Implement authentication and authorization to ensure only authorized users can access feeds.
  • Monitoring: Set up monitoring and alerting to track system performance and reliability.

Summary

Designing a social media feed system is a complex but manageable task that hinges on choosing the right strategy for your use case. Key takeaways include:

  • Fan-out on Read is ideal for users with very large follower counts.
  • Fan-out on Write is more efficient for typical users and provides a faster read experience.
  • A hybrid approach is often the best solution for a social media feed system that needs to support both types of users.
  • A robust design requires a scalable database, a caching layer, and asynchronous processing to handle the immense scale and real-time demands of a modern social media platform.

This article is part of our Interview Prep series.


Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *