🤖AI Agents for PMs

Agents are the dominant AI UX of 2025-26. PMs who can design and ship agentic products have a defensible career skill.

aiagents

Why it matters

Agents take AI from 'chat assistant' to 'does work on your behalf.' The product, ops, and UX design challenges are different and largely uncharted. PMs who master the craft now will shape the category.

The core idea

An agent is an LLM with tool access that runs in a loop until it completes a goal. PMs design agents by specifying: the goal, the available tools, the guardrails, the success criteria, and the human-in-the-loop checkpoints. The hard PM work is the UX patterns that make agent behavior trustworthy, debuggable, and recoverable when things go wrong.

What's an agent

An agent is an LLM that:

Gets a goal from the user
Plans steps
Calls tools (search, code execution, APIs)
Observes results
Continues or completes
Reports back

The loop runs autonomously. The user gives the goal; the agent figures out the means.

Why agents matter now

Frontier models (Claude Sonnet 4.6, GPT-5) reliably do multi-step reasoning and tool use
MCP (Model Context Protocol) standardizes tool integration
Cost per agent run is low enough for production
Users want 'do this for me,' not 'tell me how'

The PM design surface

For each agent feature, the PM specifies:

Goal scope. Narrow ('schedule a meeting') or broad ('plan my product launch')? Narrow agents work today; broad agents are emerging.

Available tools. What can the agent do? Search, calendar, email, file system, code execution, payment, etc. Each tool is a capability AND a risk.

Guardrails. What the agent must NOT do. Don't delete files. Don't email more than X recipients. Don't spend more than $Y.

Permissions. What requires user approval? High-stakes actions (send email, charge card, delete data) should require explicit confirmation.

Observability. How does the user see what the agent did? Activity log, undo, audit trail.

Recovery. What happens when the agent fails? Retry? Escalate? Roll back?

UX patterns that work

Show the plan first. Agent generates plan, user reviews, agent executes. Builds trust.
Streaming the work. Show the agent's actions in real time. Hides latency, builds trust.
Pause and confirm on high-stakes actions. Don't just charge the card; ask first.
Easy undo. When possible, every agent action should be reversible.
Activity log. Persistent record of what the agent did, queryable later.

UX patterns that fail

Black-box agents. Agent does something, user doesn't know what. Trust collapses on first error.
No confirmation on irreversible actions. Agent deletes a file; user is angry.
No way to interrupt. Agent is doing the wrong thing; user can't stop it.
Confusing failure modes. Agent fails silently; user doesn't know.

The metrics

Task completion rate. % of agent runs that complete the goal.
Human intervention rate. % of agent runs that require user help.
Time saved per task. Vs. manual.
Trust score. User surveys; correlates with retention.
Cost per task. Agent runs are expensive. Track this.

Common pitfalls

Designing agents for too-broad goals on day 1. Start narrow (one job done well) and expand.
Skipping observability. Users need to see what happened.
Skipping evals. Agent eval is harder than single-call eval — multi-step, stochastic — but doubly important.
Not modeling cost. Long agent runs can cost $1-10 per task. At scale, this matters.

MCP and the tool ecosystem

Anthropic's Model Context Protocol (MCP) is becoming the standard way agents connect to tools. As an AI PM:

Know what MCP servers exist for your domain
Decide build vs. integrate (use existing MCP servers vs. build your own)
Plan for the MCP ecosystem to be your distribution model — your tools should be exposable via MCP for other agents to use

Real-world examples

Decagon

Customer support agents

Decagon's AI customer support agents handle full ticket lifecycles — read context, query knowledge base, take action, escalate when needed. The PM work: tight goal scope (resolve this ticket), strong eval suite, clear escalation thresholds. The pattern is the template for vertical agent products.

Sierra

Bret Taylor's agent platform

Sierra builds AI agents for businesses. Their product design emphasizes guardrails, observability, and human-in-the-loop. The PM craft they've codified is foundational for any team building production agents.

Go deeper — recommended reading

AI Agents for PMs: Practical Guide to Build & Use in 2025

Aakash Gupta · Product Growth

↗

The Complete AI Agent Metrics Playbook: 20 Critical KPIs

Product Growth

↗

We Built an AI Agent to Automate PM in 73 mins

Aakash Gupta · Product Growth

Interview questions (1)

How would you design an AI agent feature for [your product]?

ai-pmsenior

▼

Six-step design.

Narrow the goal. Pick one specific job an agent can do well. 'Schedule a meeting with this person at the next mutually available time' beats 'manage my calendar.'

Tool inventory. What does the agent need access to? Calendar API, email, maybe contact search. Each is a capability and a risk.

Guardrails. What must NOT happen? Don't double-book. Don't email anyone not on the original list. Don't override existing meetings.

Human-in-the-loop checkpoints. Show the proposed meeting time + invite to user before sending. Easy 'edit / send / cancel.'

Eval suite. 100+ scenarios — normal cases (find a 30-min slot next week), edge cases (no availability for 2 weeks), failure cases (calendar API down). Score each.

Observability + recovery. Activity log shows what the agent did. Failed agent runs are surfaced and recoverable.

For metrics, I'd track task completion rate, human intervention rate, time saved per scheduled meeting, cost per task. Goal: 80% of scheduling jobs complete without intervention; cost <$0.10 per task; user trust score (NPS variant) above competitors.

Critical: ship narrow first. Once we nail meeting scheduling, expand to follow-up emails, prep doc generation, etc. Each expansion follows the same playbook.

Related concepts

📡AI Observability

How you know if your AI feature is working in production. The single most-underbuilt layer in AI products in 2026.