๐๏ธSystem Design for PMs
Not engineering depth โ PM-level architecture awareness. Tested at technical companies and AI-native shops.
Technical PM roles increasingly include a system design round. At Stripe, Anthropic, Cursor, and similar, you'll be tested. Without prep, you'll flounder. With it, you can hold your own without claiming to be an engineer.
PM system design is about understanding tradeoffs at the architecture level: latency vs cost, consistency vs availability, monolith vs services, sync vs async, the right database for the use case. You don't implement; you reason about the choices and ask the right questions.
The format
You're given a prompt: 'design a system for X.' Examples:
- Design a URL shortener
- Design a notification system
- Design an AI customer support agent
- Design a real-time chat system
- Design a video streaming service
You have 45-60 min to walk through the design, with the interviewer probing.
The structure (45 min)
1. Clarify requirements (5 min). Functional (what does it do?), non-functional (scale, latency, consistency, availability).
2. High-level design (10 min). Sketch the components: client, API, services, databases, queues, caches. Whiteboard or shared doc.
3. Deep dive on 1-2 components (15 min). Pick the interesting ones. Discuss data model, API design, scaling.
4. Trade-offs (10 min). What did you optimize for, what did you sacrifice? Alternative approaches.
5. Wrap-up (5 min). Summary, what you'd do differently with more time.
What PMs are tested on
- Vocabulary. Do you know what an API, database, queue, cache, load balancer do?
- Trade-off awareness. Do you understand latency vs consistency, monolith vs services?
- Scaling intuition. Can you reason about what happens at 10K โ 1M โ 100M users?
- Right questions. Do you ask about scale, consistency requirements, traffic patterns?
What you're NOT tested on: implementation details, specific languages, low-level optimization.
Common patterns to know
- CAP theorem. Consistency, Availability, Partition tolerance โ you can't have all three.
- Sync vs async. Sync = simpler, slower at scale. Async = more complex, better at scale.
- SQL vs NoSQL. SQL for structured/relational; NoSQL for high-volume key-value or document.
- Cache strategies. Read-through, write-through, cache-aside.
- Message queues. When to use Kafka / SQS / RabbitMQ.
- CDN. For static content delivery globally.
AI system design (2026 update)
For AI-native interviews, expect AI-specific prompts:
- Design a RAG-based customer support system
- Design a system to evaluate AI outputs at scale
- Design an agent that can take actions on a user's calendar
For these, focus on:
- The AI pipeline (retrieval, prompting, model call, output processing)
- Eval and observability layer
- Cost optimization (model routing, caching)
- Safety and guardrails
Practice
- Read Designing Data-Intensive Applications (Kleppmann) โ chapter 1-4
- Read system design primer on GitHub (donnemartin/system-design-primer)
- Do 5-10 mock system designs
- Watch a few YouTube walkthroughs for vocabulary
You don't need to be able to build the system. You need to be able to reason about it.
Watch-outs
- Don't fake technical depth. "I don't know specifically but here's how I'd reason about it" beats wrong technical details.
- Don't skip the requirements clarification. Designing without knowing the requirements is theater.
- Don't get lost in implementation. Stay at the architecture level.
Real-world examples
Stripe and Anthropic both use system design rounds to filter for technical PM. Candidates who can hold the conversation at the architecture level โ without faking implementation depth โ clear the round.
Go deeper โ recommended reading
Interview questions (1)
Q1Design a RAG-based customer support AI agent. Walk me through the architecture.technicalseniorโผ
Clarify. Volume of queries? Synchronous chat or async? Strict latency target? Single-tenant or multi-tenant?
Assumed: 10K queries/day, sync chat, p95 < 3 seconds, multi-tenant.
High-level design.
- User interface โ chat widget, sends query to backend.
- API gateway โ receives query, routes to AI pipeline.
- AI pipeline:
- Query embedding โ embed user query - Vector DB retrieval โ pull top 5-10 relevant doc chunks - Prompt construction โ system prompt + retrieved chunks + user query - LLM call โ Claude Sonnet (cost/quality balance) - Output processing โ parse, validate, attach citations
- Knowledge base ingestion (offline) โ periodic job that chunks docs, embeds, stores in vector DB.
- Eval pipeline โ sample 1% of production queries, score with LLM judge, alert on regressions.
- Observability โ trace every call with cost, latency, model used.
Deep dive: retrieval quality. The retrieval determines answer quality. Use hybrid search (vector + keyword). Re-rank with a cross-encoder for top results. Cache common queries (semantic cache).
Deep dive: cost optimization. Per-call cost matters at scale. Route easy questions (intent classification) to a small model first; only invoke Sonnet for substantive answers. Prompt caching for the static system prompt.
Trade-offs. Choosing Sonnet over Opus saves cost but trades some quality on edge cases. Choosing vector DB over fine-tuning trades freshness for cost. Sync chat (vs async) trades latency tolerance for UX.
At scale (1M queries/day). Add: rate limiting per tenant, fallback model (Haiku) for cost-sensitive paths, async response option with email for complex queries, dedicated retrieval infrastructure per major tenant.
Eval cadence. Daily 1% sampling for production drift. Weekly review of failures. Quarterly eval suite refresh.
This architecture supports 10K queries/day reliably and scales to 1M+ with the additions noted.