๐คAI Agents for PMs
Agents are the dominant AI UX of 2025-26. PMs who can design and ship agentic products have a defensible career skill.
Agents take AI from 'chat assistant' to 'does work on your behalf.' The product, ops, and UX design challenges are different and largely uncharted. PMs who master the craft now will shape the category.
An agent is an LLM with tool access that runs in a loop until it completes a goal. PMs design agents by specifying: the goal, the available tools, the guardrails, the success criteria, and the human-in-the-loop checkpoints. The hard PM work is the UX patterns that make agent behavior trustworthy, debuggable, and recoverable when things go wrong.
What's an agent
An agent is an LLM that:
- Gets a goal from the user
- Plans steps
- Calls tools (search, code execution, APIs)
- Observes results
- Continues or completes
- Reports back
The loop runs autonomously. The user gives the goal; the agent figures out the means.
Why agents matter now
- Frontier models (Claude Sonnet 4.6, GPT-5) reliably do multi-step reasoning and tool use
- MCP (Model Context Protocol) standardizes tool integration
- Cost per agent run is low enough for production
- Users want 'do this for me,' not 'tell me how'
The PM design surface
For each agent feature, the PM specifies:
Goal scope. Narrow ('schedule a meeting') or broad ('plan my product launch')? Narrow agents work today; broad agents are emerging.
Available tools. What can the agent do? Search, calendar, email, file system, code execution, payment, etc. Each tool is a capability AND a risk.
Guardrails. What the agent must NOT do. Don't delete files. Don't email more than X recipients. Don't spend more than $Y.
Permissions. What requires user approval? High-stakes actions (send email, charge card, delete data) should require explicit confirmation.
Observability. How does the user see what the agent did? Activity log, undo, audit trail.
Recovery. What happens when the agent fails? Retry? Escalate? Roll back?
UX patterns that work
- Show the plan first. Agent generates plan, user reviews, agent executes. Builds trust.
- Streaming the work. Show the agent's actions in real time. Hides latency, builds trust.
- Pause and confirm on high-stakes actions. Don't just charge the card; ask first.
- Easy undo. When possible, every agent action should be reversible.
- Activity log. Persistent record of what the agent did, queryable later.
UX patterns that fail
- Black-box agents. Agent does something, user doesn't know what. Trust collapses on first error.
- No confirmation on irreversible actions. Agent deletes a file; user is angry.
- No way to interrupt. Agent is doing the wrong thing; user can't stop it.
- Confusing failure modes. Agent fails silently; user doesn't know.
The metrics
- Task completion rate. % of agent runs that complete the goal.
- Human intervention rate. % of agent runs that require user help.
- Time saved per task. Vs. manual.
- Trust score. User surveys; correlates with retention.
- Cost per task. Agent runs are expensive. Track this.
Common pitfalls
- Designing agents for too-broad goals on day 1. Start narrow (one job done well) and expand.
- Skipping observability. Users need to see what happened.
- Skipping evals. Agent eval is harder than single-call eval โ multi-step, stochastic โ but doubly important.
- Not modeling cost. Long agent runs can cost $1-10 per task. At scale, this matters.
MCP and the tool ecosystem
Anthropic's Model Context Protocol (MCP) is becoming the standard way agents connect to tools. As an AI PM:
- Know what MCP servers exist for your domain
- Decide build vs. integrate (use existing MCP servers vs. build your own)
- Plan for the MCP ecosystem to be your distribution model โ your tools should be exposable via MCP for other agents to use
Real-world examples
Decagon's AI customer support agents handle full ticket lifecycles โ read context, query knowledge base, take action, escalate when needed. The PM work: tight goal scope (resolve this ticket), strong eval suite, clear escalation thresholds. The pattern is the template for vertical agent products.
Sierra builds AI agents for businesses. Their product design emphasizes guardrails, observability, and human-in-the-loop. The PM craft they've codified is foundational for any team building production agents.
Go deeper โ recommended reading
Interview questions (1)
Q1How would you design an AI agent feature for [your product]?ai-pmseniorโผ
Six-step design.
- Narrow the goal. Pick one specific job an agent can do well. 'Schedule a meeting with this person at the next mutually available time' beats 'manage my calendar.'
- Tool inventory. What does the agent need access to? Calendar API, email, maybe contact search. Each is a capability and a risk.
- Guardrails. What must NOT happen? Don't double-book. Don't email anyone not on the original list. Don't override existing meetings.
- Human-in-the-loop checkpoints. Show the proposed meeting time + invite to user before sending. Easy 'edit / send / cancel.'
- Eval suite. 100+ scenarios โ normal cases (find a 30-min slot next week), edge cases (no availability for 2 weeks), failure cases (calendar API down). Score each.
- Observability + recovery. Activity log shows what the agent did. Failed agent runs are surfaced and recoverable.
For metrics, I'd track task completion rate, human intervention rate, time saved per scheduled meeting, cost per task. Goal: 80% of scheduling jobs complete without intervention; cost <$0.10 per task; user trust score (NPS variant) above competitors.
Critical: ship narrow first. Once we nail meeting scheduling, expand to follow-up emails, prep doc generation, etc. Each expansion follows the same playbook.