Suffering from Agent Permission Fatigue? Find out your high score

Table of Contents

    claude code terminal permission prompt
    If you’re using claude code, this prompt will look very familiar to you. Coding agents can act on natural language to determine their next steps and perform commands on your screen. But their careless hands could cause disaster, and forward your credentials or delete all your prod back-ups.

    As human-in-the-loop, you’re the last line of defense. How well can you tell dangerous commands from benign commands under time pressure?

    Test your reflexes: Who is the fastest (and safest) ‘Y-spammer’ on your team? Find your permission fatigue rating at llmgame.scalex.dev before reading our breakdown below!

    continue game scoreboard

    The real threat

    You’ve seen some threats pop up across the terminal. Luckily it was just a dream test. Let’s go over some of these and the real risks they pose:

    • rm -rf ~/: a malformed remove file command resulting in the home directory being wiped. Sometimes linked to overeager command interpretation “delete everything”, sometimes results from commands becoming malformed when copied across terminals.
    • Credential exfiltration (cat ~/.aws/credentials): Silent collection of cloud provider or SSH keys. An internal phishing campaign at Anthropic resulted in credentials being successfully exfiltrated 24 out of 25 attempts.
    • Scope violations: reading and modifying files beyond the project directory scope (e.g. ~/Documents)
    • Prompt injection: content copied from external websites or mails that are interpreted as commands along with the user input

    Anthropic just posted a write-up on how to contain claude code which covers several more risks involved. Even checking out a repository could cause injection to happen, because the claude settings files from that repository would be loaded into the user’s claude code session. Your claude code set-up can be augmented with skills from external repositories which could be updated at any time to add malicious prompts, same for connected external MCP servers or plug-ins. These allow potentially for credentials or files to be leaked, persistent threats to be installed on your machine or more. Given the threat is real, what can you realistically do to avoid them?

    There must be a better way to do this?

    The human-in-the-loop approach has its problems. You may have let a couple of attacks slip through your guard in the game. Anthropic’s containment post also mentions permission fatigue:

    Our telemetry showed users approved roughly 93% of permission prompts. The more approvals a user sees, the less attention they pay to each, becoming over time much less diligent in their supervision

    Because of the high ’noise rate’, we quickly become button mashers. To combat this, Anthropic launched Auto mode. The setting can be enabled from the CLI and uses local fast-filters and a server-side scan to review tool output before it’s parsed by the local claude code agent. Then, prompts are evaluated again by the coding agent before execution.

    Auto-Mode however comes at a price. It sometimes links dangerous commands incorrectly to previous consent signals, and they report a 17% false-negative rate. You probably had a better high score in the game, but that’s hard to keep up all day and glues you to the terminal.

    Destructive call Hooks

    Claude code allows you to set up PreToolUse hooks, and the agent will recognize when certain commands should trigger a hook and then read and execute the context from that hook file before performing the actual commands. You can find some examples online where this is set-up to block rm -rf / and the like. These work as a blocklist, so they are not fool-proof. Adversaries can work around blocklists by obfuscating the commands.

    Claude Code also has a built-in sandbox mode which you can configure with /sandbox. It allows writes only to the working directory, prompts for each new network domain and blocks filesystem access outside the working directory for bash commands.

    claude code sandbox mode

    Hooks add a second layer for patterns the sandbox doesn’t cover. Claude will also happily generate some for you if you ask it, so you’re not relying on some random skill file from the internet that can pose another attack vector.

    I also like to live –dangerously-skip-permissions

    An alternative approach is to sandbox your agent, and use --dangerously-skip-permissions which will never prompt for permissions. The containment blog post from Anthropic covers these elements:

    • a hypervisor to create the sandbox
    • a proxy to intercept calls and inspect them for exfiltration risks

    You can use a hosted agent provider (like Claude Code on web) or contain your own claude code agent locally. Anthropic posted instructions on how to install devcontainers here. These containers will separate your host system from the devcontainer, but the risk of exfiltrating data the container has access to (such as credentials), remains.

    In conclusion

    It’s a whole new world with a new set of attack vectors. It’s best to remain aware of the risks and how to mitigate them. If you don’t use them already, try out devcontainers (locally or on the cloud), sandbox, auto mode and hooks, to minimize your exposure to these risks. Running --dangerously-skip-permissions without any of the accompanying guardrails makes you all the more vulnerable. For Claude, there’s a useful comparison of all the different sandboxing modes available here.

    Which approaches are you running, and what was your high score?

    ← Previous 2 years with Shape-Up, and why we switched back