TIL: Put Approval Before Risky Agent Tools
/ 2 min read
Tool-using agents are useful because they can act.
Same reason they are dangerous.
Reading a file is different from deleting a folder. Drafting an email is different from sending it. Pricing an order is different from charging a card. Creating a migration plan is different from running it against production.
If those actions all look like normal tool calls to the agent, you are relying on the model to notice the blast radius every time. Bad safety architecture.
Put human approval in front of risky tools.
The gate should be boring and explicit. Before the agent acts, show the user what will happen, what objects will change, why the agent thinks the action is appropriate, and what will not happen. Then require approval. Log the decision. Keep the action scoped to the approved packet.
Good approval gates usually cover:
- external messages
- purchases and payments
- publishing or scheduling
- destructive file/database operations
- credential or infrastructure changes
- production deploys
- anything reputationally weird
Every agent action does not need a meeting. Low-risk preparation should stay fast. Search, summarize, draft, analyze, plan, and stage the work. Ask only when the action crosses into real-world consequence.
Put the boundary in code, not in the agent’s vibes.
For agents and AI platforms, the hygiene is simple: let automation reduce toil without quietly taking over judgment. The human approves the irreversible move. They do not babysit every keystroke.
Make risky action a separate state with its own approval packet. If the agent cannot explain the action clearly enough to approve, it should not perform the action.
Part of the Effective AI Engineering series.
Source: adapted from Mirascope’s “Human Approval for Risky Tools”, MIT licensed.