SNS 피드 시스템 설계 가이드

결론

SNS 피드 시스템을 설계할 때는 다음 6가지 핵심 영역을 고려해야 합니다:

콘텐츠 필터링: 관계 기반(친구/팔로우), 위치 기반(국가/지역), 공개 범위(공개/친구만/비공개), 콘텐츠 타입 필터링
랭킹/정렬 알고리즘: 시간 기반, 인기도 기반, 개인화 점수, 시간 감쇠, 중복 제거
성능/확장성: Fan-out 전략(Write vs Read), 캐싱, 페이지네이션, 데이터베이스 샤딩
실시간성: 실시간 업데이트, 신선도 보장, 일관성 vs 가용성 trade-off
개인화: 협업 필터링, 콘텐츠 기반 필터링, 머신러닝 모델, Cold Start 해결
보안/품질: 스팸 방지, 유해 콘텐츠 필터링, 프라이버시 보호, 콘텐츠 품질 관리

1. 콘텐츠 필터링 전략

1.1 관계 기반 필터링

사용자 간의 관계에 따라 게시물을 필터링하는 가장 기본적인 방식입니다.

주요 고려사항:

1촌 관계: 친구, 팔로우한 사용자
N촌 관계: 친구의 친구 (네트워크 확장)
그룹/커뮤니티: 멤버십 기반 콘텐츠
블록/뮤트: 특정 사용자 제외 처리

1.2 위치 기반 필터링

사용자 질문의 첫 번째 항목인 "사용자 위치별로 제한할건가요?"에 대한 답변입니다.

구현 방식:

IP 기반: 국가 단위 판별 (가장 간단)
GPS 기반: 실시간 정밀 위치 (모바일 앱)
프로필 설정: 사용자가 직접 설정한 거주지
Geofencing: 특정 반경(예: 5km) 내 콘텐츠만 노출

선택 기준:

국가별 제한: 법적 규제, 언어권 구분
지역별 제한: 로컬 커뮤니티, 위치 기반 서비스

1.3 공개 범위 설정

사용자 질문의 "공개 범위 설정이 필요한가요?"에 대한 답변입니다.

표준 공개 범위:

Public: 모든 사용자에게 공개
Friends/Followers Only: 연결된 사용자만
Custom: 특정 사용자/그룹 지정
Private: 본인만 보기

구현 시 주의사항:

공개 범위는 게시물 메타데이터에 저장
피드 생성 시 반드시 검증 필요
캐싱 시에도 공개 범위 고려 필수

1.4 중복 노출 제한

사용자 질문의 "중복 노출 제한 규칙이 있나요?"에 대한 답변입니다.

일반적인 규칙:

동일 작성자 연속 제한: 한 사용자의 게시물이 연속 3개 이상 노출되지 않도록
이미 본 콘텐츠 필터링: 사용자가 이미 본 게시물 제외
유사 콘텐츠 제거: 중복되거나 매우 유사한 콘텐츠 필터링
시간 기반 제한: 동일 게시물이 24시간 내 재노출되지 않도록

2. 랭킹 및 정렬 알고리즘

2.1 기본 정렬 방식

시간 기반 정렬:

최신순 (Reverse Chronological): 대부분의 SNS 기본값
시간 감쇠 적용: 오래된 게시물 점수 감소
```
score = engagement_score / (time_elapsed_hours + 2)^gravity
```
여기서 gravity는 보통 1.5~2.0 사이 값

2.2 인기도 기반 정렬

Engagement Score 계산:

engagement_score = (likes × 1) + (comments × 2) + (shares × 3)

댓글과 공유에 더 높은 가중치를 부여하는 이유는 이들이 더 강한 사용자 참여를 의미하기 때문입니다. Facebook News Feed Ranking (opens in a new tab)

Meta 공식 블로그에서 "multiple ML models assess the likelihood of different forms of engagement on any given post"라고 명시되어 있습니다.

2.3 개인화 및 랭킹 시스템

현대 SNS 피드는 3단계 파이프라인으로 개인화를 구현합니다. Candidate Generation - Google ML (opens in a new tab)

"Candidate generation is aimed at efficiently narrowing down a vast item pool (e.g., from millions to hundreds)"

2.3.1 Stage 1: Candidate Generation (후보 생성)

수백만 개의 게시물에서 수백~수천 개로 빠르게 축소하는 단계입니다.

목적:

전체 게시물 → 1000개 정도로 축소
빠른 필터링 (ML 사용 안함)
기본적인 규칙 기반

구현 방법:

def generate_candidates(user_id, limit=1000):
    # 팔로우하는 사용자들의 최근 게시물 조회
    following_users = get_following_list(user_id)
    
    candidates = []
    for author_id in following_users:
        # 최근 7일 이내 게시물만
        recent_posts = db.query("""
            SELECT * FROM posts 
            WHERE author_id = ? 
            AND created_at > NOW() - INTERVAL 7 DAY
            AND visibility IN ('public', 'friends')
            ORDER BY created_at DESC
            LIMIT 10
        """, author_id)
        candidates.extend(recent_posts)
    
    # 블록/뮤트 사용자 제외
    blocked_users = get_blocked_users(user_id)
    candidates = [p for p in candidates if p.author_id not in blocked_users]
    
    return candidates[:limit]

Fan-out 전략에 따른 차이:

Fan-out on Write: Redis에서 이미 준비된 리스트 조회
Fan-out on Read: 실시간 DB 쿼리

Multi-Stage Recommendation Systems (opens in a new tab)

"This multi-stage architecture allows systems to efficiently process millions of items while maintaining recommendation quality"

2.3.2 Stage 2: Ranking (개인화 랭킹)

1000개 후보에 대해 머신러닝 모델로 점수를 계산하고 정렬하는 단계입니다.

목적:

1000개 → 50개로 정밀 필터링
ML 모델 사용
개인화 적용

현대 SNS는 크게 2가지 방식을 혼합 사용합니다:

방식 A: Feature-based (Meta/Facebook 주력)

수백 개의 개별 feature를 조합하여 점수를 계산합니다.

입력 Features:

# 사용자 특성
user_features = {
    'age': 25,
    'gender': 'F',
    'location': 'Seoul',
    'avg_session_time_minutes': 15.3,
    'posts_liked_last_week': 42,
    'interests': ['tech', 'travel', 'food'],  # 원핫 인코딩
}
 
# 게시물 특성
post_features = {
    'post_id': 9876,
    'author_id': 555,
    'created_hours_ago': 2.5,
    'likes_count': 150,
    'comments_count': 23,
    'shares_count': 5,
    'is_video': 1,  # boolean
    'has_link': 0,
    'hashtags_count': 3,
    'text_length': 280,
}
 
# 관계 특성
relationship_features = {
    'is_following': 1,
    'past_interactions_count': 5,  # 이 작성자와 과거 상호작용
    'social_distance': 1,  # 1촌, 2촌, ...
    'last_interaction_days_ago': 3,
}
 
# 맥락 특성
context_features = {
    'time_of_day': 14,  # 14시
    'day_of_week': 3,  # 수요일
    'is_mobile': 1,
}

모델: Gradient Boosting Trees

Evaluating Boosted Decision Trees at Meta (opens in a new tab)

"Applied to several ranking models at Facebook, including feed ranking"

# 모든 features를 하나의 벡터로 결합
feature_vector = flatten([
    user_features,
    post_features, 
    relationship_features,
    context_features
])
# feature_vector = [25, 0, ..., 1]  # 300개 정도
 
# Gradient Boosting (XGBoost/LightGBM)로 예측
model = load_model('feed_ranking_gbdt.pkl')
score = model.predict([feature_vector])[0]
# score = 0.76 (상호작용 확률)

특징:

벡터 유사도 사용 안함
각 feature가 독립적으로 기여
해석 가능성 높음
안정적이고 검증된 방식

방식 B: Embedding-based (TikTok/YouTube 주력)

사용자와 게시물을 각각 벡터로 변환하고 유사도를 계산합니다.

Two-Tower 모델 아키텍처:

Two-Tower Embedding Model (opens in a new tab)

"Two-tower architectures map query and candidate entities to a shared embedding space such that semantically similar entities cluster closer together"

# 1. User Tower: 사용자를 128차원 벡터로 변환
user_tower = NeuralNetwork([
    Embedding(user_id, dim=64),
    Dense(past_liked_posts, dim=32),
    Dense(interests, dim=32),
    Concatenate(),  # 64 + 32 + 32 = 128
    Dense(128, activation='relu'),
])
 
user_embedding = user_tower.predict(user_data)
# user_embedding = [0.2, -0.5, 0.8, ..., 0.3]  # 128차원
 
# 2. Item Tower: 게시물을 128차원 벡터로 변환
item_tower = NeuralNetwork([
    Embedding(post_id, dim=64),
    TextEmbedding(content_text, dim=32),
    ImageEmbedding(content_image, dim=32),
    Concatenate(),  # 64 + 32 + 32 = 128
    Dense(128, activation='relu'),
])
 
post_embedding = item_tower.predict(post_data)
# post_embedding = [0.1, -0.3, 0.9, ..., 0.2]  # 128차원
 
# 3. 코사인 유사도 계산
def cosine_similarity(vec1, vec2):
    dot_product = np.dot(vec1, vec2)
    norm1 = np.linalg.norm(vec1)
    norm2 = np.linalg.norm(vec2)
    return dot_product / (norm1 * norm2)
 
score = cosine_similarity(user_embedding, post_embedding)
# score = 0.82

Snap's Two-Tower Embedding (opens in a new tab)

"Two-tower models combine multiple positive engagement signals into one classification head, generating meaningful and personalized embeddings"

특징:

벡터 유사도 직접 사용
사용자/게시물 임베딩 사전 계산 가능
ANN(Approximate Nearest Neighbors)으로 빠른 검색
새로운 콘텐츠에 대한 cold start 완화

방식 C: Hybrid (실제 대부분의 SNS)

# 두 방식의 점수를 가중 평균
embedding_score = cosine_similarity(user_emb, post_emb)  # 0.82
feature_score = gbdt_model.predict(features)              # 0.76
 
final_score = 0.6 * embedding_score + 0.4 * feature_score
# final_score = 0.6 × 0.82 + 0.4 × 0.76 = 0.796

전체 랭킹 프로세스:

def rank_posts(user_id, candidates):
    scored_posts = []
    
    for post in candidates:  # 1000개 반복
        # Features 생성
        features = extract_features(user_id, post)
        
        # Embedding 점수
        user_emb = get_user_embedding(user_id)
        post_emb = get_post_embedding(post.id)
        emb_score = cosine_similarity(user_emb, post_emb)
        
        # Feature 점수
        feat_score = gbdt_model.predict(features)
        
        # 최종 점수
        final_score = 0.6 * emb_score + 0.4 * feat_score
        scored_posts.append((post, final_score))
    
    # 점수 기준 내림차순 정렬
    scored_posts.sort(key=lambda x: x[1], reverse=True)
    
    # 상위 50개만 반환
    return scored_posts[:50]

Instagram Feed Ranking (opens in a new tab)

"Multiple machine learning models work together to deliver your experience"

2.3.3 Stage 3: Re-ranking (재정렬)

50개를 최종 20개로 줄이고, 비즈니스 규칙을 적용하는 단계입니다.

목적:

다양성 보장
비즈니스 규칙 적용
사용자 경험 최적화

Re-ranking - Google ML Guide (opens in a new tab)

"Re-ranking allows the system to assess factors beyond initial scoring, such as diversity, freshness, and fairness"

def rerank_feed(ranked_posts):
    final_feed = []
    author_count = {}  # 작성자별 노출 횟수 추적
    
    for post, score in ranked_posts:
        # 규칙 1: 같은 작성자 연속 3개 제한
        if author_count.get(post.author_id, 0) >= 3:
            continue
        
        # 규칙 2: 이미 본 게시물 제외
        if user_has_seen(post.id):
            continue
        
        # 규칙 3: 최소 품질 점수 (0.3 이상)
        if score < 0.3:
            break  # 이미 정렬되어 있으므로 break
        
        final_feed.append(post)
        author_count[post.author_id] = author_count.get(post.author_id, 0) + 1
        
        if len(final_feed) >= 20:
            break
    
    # 광고 삽입 (5번째, 15번째 위치)
    final_feed = insert_ads(final_feed, positions=[4, 14])
    
    return final_feed

2.3.4 왜 3단계로 나누나?

계산 비용 최적화 때문입니다:

시나리오: 1억 개 게시물, ML 모델 추론 시간 = 0.3ms

❌ 전체에 ML 적용:
1억 × 0.3ms = 30,000초 = 8.3시간 (불가능!)

✅ 3단계 파이프라인:
├─ Stage 1 (SQL 필터링): 1억 → 1000개 (0.001초)
├─ Stage 2 (ML 랭킹): 1000 × 0.3ms = 0.3초
└─ Stage 3 (재정렬): 50개 규칙 적용 (0.01초)

총 소요 시간: ~0.31초 ✓

2.3.5 전통적인 개인화 기법

3단계 파이프라인 외에도, 다음 기법들이 함께 사용됩니다:

협업 필터링 (Collaborative Filtering):

User-based: 비슷한 사용자가 좋아한 콘텐츠 추천
Item-based: 비슷한 게시물 추천
Matrix Factorization으로 구현

콘텐츠 기반 필터링:

사용자가 과거에 좋아요/댓글한 주제 분석
해시태그, 키워드 선호도 추출

Contextual Bandit:

탐색(exploration)과 활용(exploitation) 균형
새로운 콘텐츠도 일부 노출하여 선호도 학습

Cold Start 해결:

신규 사용자: 인기 콘텐츠 위주 노출
신규 게시물: 소수 사용자에게 테스트 후 확산

3. 성능 및 확장성 설계

3.1 Fan-out 전략 선택

SNS 피드 시스템에서 가장 중요한 아키텍처 결정입니다. Fan-Out Design Pattern (opens in a new tab)

Fan-out on Write (Push 모델)

게시물 작성 시점에 모든 팔로워의 피드에 미리 저장합니다.

장점:

읽기 속도 매우 빠름 (이미 준비된 피드 반환)
읽기 부하 분산

단점:

쓰기 속도 느림 (팔로워가 1000명이면 1000번 쓰기)
스토리지 사용량 많음
유명인(수백만 팔로워)에게 비효율적

Baseline System Design — Facebook Newsfeed (opens in a new tab)에서 "When a user posts a tweet, the tweet is inserted into the timeline of each of their followers"라고 명시되어 있습니다.

적합한 경우:

일반 사용자 (팔로워 < 10,000)
실시간성이 중요한 경우

Fan-out on Read (Pull 모델)

피드 요청 시점에 실시간으로 조회하여 구성합니다.

장점:

쓰기 속도 빠름 (게시물만 저장)
스토리지 절약
유명인에게 효율적

단점:

읽기 속도 느림 (실시간 조회 및 정렬)
데이터베이스 부하 높음

Design a News Feed System (opens in a new tab)에서 "Fan-Out-on-Read: Higher latency because data is fetched and computed upon request"라고 명시되어 있습니다.

적합한 경우:

유명인 (팔로워 > 100,000)
읽기가 적은 경우

Hybrid 접근법 (권장)

실제 대규모 SNS는 하이브리드 방식을 사용합니다. Twitter Architecture (opens in a new tab)

구현 방식:

일반 사용자: Fan-out on Write
유명인: Fan-out on Read
임계값: 팔로워 수 10만명 정도

Re-evaluating Fan-Out-on-Write vs. Fan-Out-on-Read (opens in a new tab)에서 "For users who has lots of followers, stop fanout write for their new posts. Instead, the followers fanout read the celebrities' updates"라고 명시되어 있습니다.

3.2 데이터베이스 및 캐싱 전략

데이터베이스 선택:

Cassandra: 높은 쓰기 처리량, 수평 확장 용이 Designing a Scalable News Feed System (opens in a new tab)
Redis: 인메모리 캐싱, 타임라인 저장
MySQL/PostgreSQL: 사용자 관계, 게시물 메타데이터

Designing Instagram - GeeksforGeeks (opens in a new tab)에서 "Redis is the most fitting choice for storing data in horizontally scalable systems that can accommodate heavy read traffic"라고 명시되어 있습니다.

캐싱 전략:

L1 Cache (Redis): 사용자별 피드 (TTL 5-15분)
L2 Cache (CDN): 미디어 콘텐츠 (이미지, 동영상)
Hot/Cold Data 분리: 최근 데이터는 메모리, 오래된 데이터는 디스크

3.3 페이지네이션

Cursor-based Pagination (권장):

GET /feed?cursor=eyJpZCI6MTIzNDU2fQ==&limit=20

장점:

Offset 문제 해결 (중간에 새 게시물 추가되어도 중복/누락 없음)
일관된 결과
무한 스크롤에 적합

구현:

// 마지막 게시물의 ID와 timestamp를 cursor로 인코딩
cursor = base64encode({id: last_post_id, timestamp: last_timestamp})

4. 실시간성 및 신선도

4.1 실시간 업데이트 방식

WebSocket / Server-Sent Events (SSE):

새 게시물 실시간 알림
"새 게시물 N개" 배지 표시
사용자가 클릭 시 피드 갱신

구현 예시:

// 클라이언트
const eventSource = new EventSource('/feed/updates');
eventSource.onmessage = (event) => {
  showNewPostsBadge(event.data.count);
};

4.2 일관성 vs 가용성 Trade-off

SNS 피드는 CAP theorem에서 가용성(Availability)과 분할 내성(Partition Tolerance)을 선택합니다.

Eventual Consistency 허용:

약간의 지연 허용 (몇 초 ~ 몇 분)
모든 사용자가 즉시 같은 피드를 보지 않아도 됨
시스템 전체 가용성이 우선

5. 보안 및 품질 관리

5.1 스팸 및 어뷰징 방지

Rate Limiting:

게시물 생성: 시간당 10개 제한
좋아요: 분당 60개 제한
API 요청: 초당 100개 제한

탐지 메커니즘:

중복 콘텐츠 감지
Bot 활동 패턴 탐지
의심스러운 행동 모니터링

5.2 유해 콘텐츠 필터링

자동화 필터링:

AI 기반 이미지/텍스트 분석
욕설/혐오 표현 감지
성인 콘텐츠 분류

수동 검토:

신고 시스템
우선순위 큐 (신고 횟수 기반)
전문 검토팀 운영

6. 시스템 메트릭 및 모니터링

6.1 핵심 지표

성능 지표:

Feed Load Time: P95 < 500ms
Write Latency: P95 < 2000ms
Cache Hit Rate: > 80%

사용자 참여 지표:

Daily Active Users (DAU)
Engagement Rate: (likes + comments + shares) / impressions
Session Duration
Scroll Depth

시스템 건강도:

Error Rate: < 0.1%
Database Connection Pool Usage
Queue Length (Message Broker)

6.2 A/B 테스팅

랭킹 알고리즘 개선을 위한 지속적인 실험이 필요합니다.

테스트 예시:

그룹 A: 시간순 정렬
그룹 B: ML 기반 개인화 정렬
측정: 체류 시간, engagement rate

7. 실제 구현 아키텍처 예시

7.1 전체 시스템 구조

Client (Mobile/Web)
    ↓
Load Balancer
    ↓
API Gateway
    ↓
┌─────────────┬──────────────┬─────────────┐
│  Feed       │  Post        │  User       │
│  Service    │  Service     │  Service    │
└─────────────┴──────────────┴─────────────┘
    ↓              ↓               ↓
┌─────────────┬──────────────┬─────────────┐
│  Redis      │  Cassandra   │  PostgreSQL │
│  (Cache)    │  (Posts)     │  (Users)    │
└─────────────┴──────────────┴─────────────┘

7.2 피드 생성 프로세스 (3단계 파이프라인)

완전한 피드 생성 프로세스는 캐싱과 3단계 파이프라인을 결합합니다.

def get_feed(user_id, cursor=None, limit=20):
    """
    사용자 피드 조회 API
    
    Args:
        user_id: 사용자 ID
        cursor: 페이지네이션 커서
        limit: 반환할 게시물 수 (기본 20개)
    
    Returns:
        피드 게시물 리스트 + 다음 커서
    """
    
    # ========== 단계 0: 캐시 확인 ==========
    cache_key = f"feed:{user_id}:{cursor}"
    cached_feed = redis.get(cache_key)
    
    if cached_feed:
        print("Cache hit!")
        return json.loads(cached_feed)
    
    print("Cache miss - generating feed...")
    
    # ========== Stage 1: Candidate Generation ==========
    # 1억 개 → 1000개로 축소 (0.001초)
    
    candidates = generate_candidates(user_id, limit=1000)
    print(f"Stage 1: Generated {len(candidates)} candidates")
    
    # ========== Stage 2: Ranking ==========
    # 1000개 → 50개로 정밀 필터링 (0.3초)
    
    # 사용자 임베딩 조회 (미리 계산되어 있음)
    user_embedding = embedding_cache.get(f"user_emb:{user_id}")
    
    scored_posts = []
    for post in candidates:
        # Feature 추출
        features = extract_features(user_id, post)
        
        # Embedding 점수
        post_embedding = embedding_cache.get(f"post_emb:{post.id}")
        emb_score = cosine_similarity(user_embedding, post_embedding)
        
        # Feature 점수 (GBDT)
        feat_score = ranking_model.predict([features])[0]
        
        # Hybrid 점수
        final_score = 0.6 * emb_score + 0.4 * feat_score
        
        scored_posts.append({
            'post': post,
            'score': final_score,
            'emb_score': emb_score,
            'feat_score': feat_score,
        })
    
    # 점수 기준 정렬
    scored_posts.sort(key=lambda x: x['score'], reverse=True)
    top_posts = scored_posts[:50]
    
    print(f"Stage 2: Ranked to top {len(top_posts)} posts")
    
    # ========== Stage 3: Re-ranking ==========
    # 50개 → 20개 + 비즈니스 규칙 (0.01초)
    
    final_feed = []
    author_count = {}
    seen_posts = get_seen_posts(user_id)
    
    for item in top_posts:
        post = item['post']
        score = item['score']
        
        # 규칙 1: 이미 본 게시물 제외
        if post.id in seen_posts:
            continue
        
        # 규칙 2: 같은 작성자 연속 3개 제한
        if author_count.get(post.author_id, 0) >= 3:
            continue
        
        # 규칙 3: 최소 품질 점수
        if score < 0.3:
            break
        
        final_feed.append(post)
        author_count[post.author_id] = author_count.get(post.author_id, 0) + 1
        
        if len(final_feed) >= limit:
            break
    
    # 광고 삽입
    final_feed = insert_sponsored_posts(final_feed, positions=[4, 14])
    
    print(f"Stage 3: Re-ranked to final {len(final_feed)} posts")
    
    # ========== 캐싱 및 반환 ==========
    
    # 다음 페이지 커서 생성
    next_cursor = None
    if len(final_feed) >= limit:
        last_post = final_feed[-1]
        next_cursor = base64encode({
            'id': last_post.id,
            'timestamp': last_post.created_at
        })
    
    result = {
        'posts': final_feed,
        'next_cursor': next_cursor,
    }
    
    # Redis 캐싱 (TTL 10분)
    redis.setex(cache_key, 600, json.dumps(result))
    
    return result

성능 분석:

전체 프로세스 타임라인:

0ms    - API 요청 수신
1ms    - Redis 캐시 확인 (miss)
2ms    - Stage 1 시작: Candidate Generation
3ms    - DB/Redis에서 1000개 조회 완료
303ms  - Stage 2 완료: 1000개 랭킹 (각 0.3ms)
313ms  - Stage 3 완료: 재정렬 및 규칙 적용
315ms  - Redis 캐싱 및 응답 반환

총 소요 시간: ~315ms (P95 < 500ms 목표 달성 ✓)

캐시 hit 시: ~1ms (99% 빠름!)

8. 확장 시나리오별 전략

8.1 사용자 100만 → 1000만

병목:

데이터베이스 읽기 부하

해결:

Read Replica 추가 (Master-Slave)
Redis Cluster 도입
CDN 활용

8.2 사용자 1000만 → 1억

병목:

단일 데이터베이스 한계

해결:

데이터베이스 샤딩 (User ID 기반)
Microservices 분리 (Feed, Post, User Service)
Message Queue 도입 (Kafka, RabbitMQ)

8.3 사용자 1억+

병목:

글로벌 지연 시간
유명인 팬아웃 문제

해결:

Multi-Region 배포
Edge Computing
Hybrid Fan-out 고도화

9. 참고 자료

시스템 설계

랭킹 알고리즘

다단계 추천 시스템

Embedding 및 Two-Tower 모델

Fan-out 패턴

10. 체크리스트

기능적 요구사항

관계 기반 필터링 (친구/팔로우/그룹)
위치 기반 필터링 (국가/지역)
공개 범위 설정 (공개/친구만/비공개)
중복 노출 제한 규칙
블록/뮤트 처리
이미 본 콘텐츠 필터링

SNS 피드 시스템 설계 가이드

결론

1. 콘텐츠 필터링 전략

1.1 관계 기반 필터링

1.2 위치 기반 필터링

1.3 공개 범위 설정

1.4 중복 노출 제한

2. 랭킹 및 정렬 알고리즘

2.1 기본 정렬 방식

2.2 인기도 기반 정렬

2.3 개인화 및 랭킹 시스템

2.3.1 Stage 1: Candidate Generation (후보 생성)

2.3.2 Stage 2: Ranking (개인화 랭킹)

방식 A: Feature-based (Meta/Facebook 주력)

방식 B: Embedding-based (TikTok/YouTube 주력)

방식 C: Hybrid (실제 대부분의 SNS)

2.3.3 Stage 3: Re-ranking (재정렬)

2.3.4 왜 3단계로 나누나?

2.3.5 전통적인 개인화 기법

3. 성능 및 확장성 설계

3.1 Fan-out 전략 선택

Fan-out on Write (Push 모델)

Fan-out on Read (Pull 모델)

Hybrid 접근법 (권장)

3.2 데이터베이스 및 캐싱 전략

3.3 페이지네이션

4. 실시간성 및 신선도

4.1 실시간 업데이트 방식

4.2 일관성 vs 가용성 Trade-off

5. 보안 및 품질 관리

5.1 스팸 및 어뷰징 방지

5.2 유해 콘텐츠 필터링

6. 시스템 메트릭 및 모니터링

6.1 핵심 지표

6.2 A/B 테스팅

7. 실제 구현 아키텍처 예시

7.1 전체 시스템 구조

7.2 피드 생성 프로세스 (3단계 파이프라인)

8. 확장 시나리오별 전략

8.1 사용자 100만 → 1000만

8.2 사용자 1000만 → 1억

8.3 사용자 1억+

9. 참고 자료

시스템 설계

랭킹 알고리즘

다단계 추천 시스템

Embedding 및 Two-Tower 모델

Fan-out 패턴

10. 체크리스트

기능적 요구사항

비기능적 요구사항

보안 및 품질

모니터링