Skip to main content

Command Palette

Search for a command to run...

OpenClaw Security Playbook: 7-Layer Defense for AI Agents

OS credential isolation, VPN-only access, container sandboxing, and runtime security for production AI agents

Updated
41 min read
OpenClaw Security Playbook: 7-Layer Defense for AI Agents

Part 2 of 3 | ← Part 1: Attack Vectors and Verification | → Part 3: Detection and Threat Hunting

The Part 1 immediate mitigations - fixing network binding, rotating credentials, deleting backup files - stop attackers who found your exposed Gateway on Shodan. But they don't address the underlying architectural vulnerabilities.

If you stopped at Part 1, you're still vulnerable to:

  • Infostealer malware harvesting credentials from your next AI agent deployment (plaintext storage pattern remains)

  • Prompt injection via email or Slack achieving 91.3% success rate (no runtime guards)

  • Supply chain attacks through backdoored skills (no integrity monitoring)

  • Zero visibility into agent behavior when compromise occurs (no telemetry)

Within 72 hours of the January 2026 disclosures, attackers adapted infostealer malware to target these patterns. The Clawdbot vulnerabilities revealed architectural weaknesses affecting the entire AI agent ecosystem - not just one project.

This playbook is for: Security engineers, DevSecOps practitioners, and staff-level developers who own OpenClaw/Clawdbot agent deployments and need a concrete path from "we closed the obvious backdoors" to "we have a hardened, monitored, policy-driven production architecture."

By the end, you'll have: OS-level credential isolation (Keychain/Secret Service), VPN-only access, a hardened container runtime, runtime security plugins (openclaw-shield), supply chain integrity checks, and enterprise telemetry (openclaw-telemetry) — plus a 6-week rollout plan you can hand to your team. If you are running on NVIDIA's OpenShell sandbox, Cisco's DefenseClaw provides a unified implementation of Layers 3–6 and is referenced throughout as an alternative path.


Preflight: Before You Begin

If you haven't completed Part 1, run these verification checks first:

# Check 1: Gateway bound to localhost only
ss -lntp | grep 18789 2>/dev/null
# Expected: 127.0.0.1:18789 (NOT 0.0.0.0:18789)

# Check 2: No backup credential files
find ~/ \( -path "*/.moltbot/*" -o -path "*/.clawdbot/*" \) -name "*.bak*" 2>/dev/null
# Expected: No output

# Check 3: Tool logging enabled
grep -E "toolExecution|logging" ~/.moltbot/config.yml 2>/dev/null
# Expected: enabled: true

If any check fails: Complete Part 1's immediate mitigations before proceeding. Part 2 builds on those foundations.

Environment requirements:

  • Linux/macOS with CLI access (commands tested on Ubuntu 22.04 + macOS 14)

  • Docker/docker-compose for container sandboxing

  • Basic familiarity with YAML config, shell commands, and security operations


Table of Contents

  1. Defense-in-Depth Strategy for OpenClaw Agents

  2. 6-Week Implementation Roadmap

  3. Layer 1: OS-Level Credential Isolation

  4. Layer 2: Network Segmentation with VPN

  5. Layer 3: Runtime Sandboxing with Containers

  6. Layer 4: Runtime Security Enforcement

  7. Layer 5: Supply Chain Integrity Monitoring

  8. Layer 6: Behavioral Monitoring with Telemetry

  9. Layer 7: Organizational Security Controls

  10. What Nobody Discusses: Hard Problems

  11. Reality Check: Security vs. Usability

  12. Production Deployment Checklist


Defense-in-Depth Strategy for OpenClaw Agents

Traditional application security assumes you can prevent initial compromise through perimeter defenses - firewalls, authentication, input validation. Agentic AI inverts this assumption.

The attack surface includes:

  • Model layer: Prompt injection manipulates decision-making (91.3% success rate)

  • Tool layer: Malicious skills introduce backdoored capabilities

  • Input layer: Untrusted data from email, Slack, Twitter

  • Storage layer: Credentials in plaintext files with backup persistence

  • Network layer: Localhost authentication bypass through proxies

You cannot perfectly defend all five layers simultaneously. Defense-in-depth accepts this reality and builds compensating controls.

The Seven-Layer Defense Model

Each layer provides independent protection so that compromise at one layer doesn't cascade:

graph TB
    subgraph "Layer 1: Credential Isolation"
        A["🔐 OS Keychain/Secret Service"]
    end

    subgraph "Layer 2: Network Segmentation"
        B["🌐 VPN-Only Access"]
    end

    subgraph "Layer 3: Runtime Sandboxing"
        C["📦 Container with Dropped Capabilities"]
    end

    subgraph "Layer 4: Runtime Security Enforcement"
        D["🛡️ openclaw-shield Plugin"]
    end

    subgraph "Layer 5: Supply Chain Security"
        E["✅ Manifest Integrity Checking"]
    end

    subgraph "Layer 6: Behavioral Monitoring"
        F["📊 openclaw-telemetry + SIEM"]
    end

    subgraph "Layer 7: Organizational Controls"
        G["🚨 openclaw-detect + IR Playbook"]
    end

    A --> B
    B --> C
    C --> D
    D --> E
    E --> F
    F --> G

    classDef infraLayer fill:#2E86AB,stroke:#1A5276,stroke-width:3px,color:#fff
    classDef runtimeLayer fill:#00B894,stroke:#0B6B56,stroke-width:4px,color:#fff
    classDef orgLayer fill:#6C5CE7,stroke:#4834DF,stroke-width:3px,color:#fff

    class A,B,C infraLayer
    class D,E,F runtimeLayer
    class G orgLayer

Figure 1: Defense-in-depth for OpenClaw AI agents. Each layer provides independent protection. Compromise at Layer 1 (credential storage) is mitigated by Layer 3 (sandboxing). Compromise at Layer 5 (malicious skill) is detected by Layer 6 (behavioral monitoring). Green layers highlight community tools available since February 2026. For OpenShell-based deployments, DefenseClaw (Cisco, April 2026) implements Layers 3–6 as a single governance layer — see the individual layer sections below for integration points.

Threat Model

Primary threats:

  1. External attacker with network access exploiting exposed Gateway port

  2. Infostealer malware on user's machine targeting credential files and backups

  3. Prompt injection via untrusted input achieving 91.3% success rate

  4. Supply chain compromise through malicious skill injection

  5. Insider threat with legitimate access attempting privilege escalation

Out of scope (requires different controls):

  • Physical access to unlocked machine

  • Compromised LLM provider API (Anthropic/OpenAI infrastructure breach)

  • Zero-day vulnerabilities in underlying OS or container runtime

  • State-level adversaries with custom exploits


6-Week Implementation Roadmap

This roadmap provides incremental deployment with measurable risk reduction at each phase. Week numbers map directly to defense layers.

Week 1: Stop Active Exploitation (Part 1 Review)

If you haven't completed Part 1:

  • [ ] Fix network binding (localhost only) - 5-10 min

  • [ ] Rotate exposed credentials - 45-90 min (5-10 min per service)

  • [ ] Delete backup files - 2-5 min

  • [ ] Enable basic logging - 30-45 min

Time investment: 2-3 hours Risk reduction: ~70% (stops active internet-based attacks)

Week 2: Layer 1 - OS Credential Isolation

  • [ ] Migrate credentials to OS Keychain/Secret Service

  • [ ] Verify plaintext files deleted

  • [ ] Configure Touch ID requirement (optional)

  • [ ] Test credential access audit logs

Time investment:

  • Experienced DevSecOps: 10-15 minutes

  • Mid-level engineer: 20-30 minutes

  • First-time migration: 45-60 minutes (includes troubleshooting)

Risk reduction: Additional ~15% (defeats infostealer malware)

Week 3-4: Layers 2-3 - Network & Container Sandboxing

Week 3: Layer 2 - VPN Setup

  • [ ] Deploy VPN (Tailscale or WireGuard)

  • [ ] Configure ACLs for device allowlisting

  • [ ] Test remote access via VPN

  • [ ] Remove reverse proxy if present

Week 4: Layer 3 - Container Hardening

  • [ ] Build hardened container image

  • [ ] Test containerized deployment

  • [ ] Verify capability dropping and read-only filesystem

  • [ ] Configure minimal volume mounts

Time investment: 8-12 hours total Risk reduction: Additional ~10% (limits blast radius, eliminates reverse proxy risk)

Week 5: Layers 4-6 - Runtime Security & Monitoring

  • [ ] Install openclaw-shield plugin

  • [ ] Enable Prompt Guard and Output Scanner layers

  • [ ] Deploy openclaw-telemetry for behavioral monitoring

  • [ ] Configure SIEM forwarding

  • [ ] Test alerting integration

  • [ ] Setup skill integrity monitoring (daily cron)

Time investment: 6-8 hours Risk reduction: Additional ~8% (prevents and detects attacks in real-time)

Week 6: Layer 7 - Organizational Controls

  • [ ] Deploy openclaw-detect via MDM

  • [ ] Document security policies

  • [ ] Establish incident response procedures

  • [ ] Train security team on agent-specific threats

  • [ ] Complete production deployment checklist

Time investment: 6-8 hours Risk reduction: Additional ~2% (governance, discovery, and response capabilities)

Total Implementation

Time: 5-6 weeks (part-time effort) Coverage: Addresses known failure modes from January-February 2026 disclosures Residual risk: Novel prompt injection patterns (requires continuous monitoring)


Layer 1: OS-Level Credential Isolation

Framework mapping: OWASP LLM06 (Sensitive Information Disclosure), NIST CSF PR.DS-1 (Data at rest protected)

Problem

Plaintext JSON files containing API keys sit in ~/.moltbot/ with mode 600 permissions. Infostealer malware runs as the user and inherits read access. Backup files persist after deletion, creating a 35-day window of credential exposure through .bak, .bak.1, .bak.2, .bak.3, and .bak.4 files.

Goal

All credentials stored in OS-native secure storage (Keychain on macOS, Secret Service on Linux) that requires explicit user authorization for each access. No plaintext files on filesystem. Credential access creates audit log entries. Optional biometric authentication prevents programmatic theft.

macOS: Keychain Integration

The migration process moves credentials from plaintext JSON to the Keychain, where they're protected by hardware-backed encryption. Each access can create an audit log entry visible in Keychain Access.app.

# Core migration logic - adds credentials to Keychain
security add-generic-password \
    -s "Clawdbot-Anthropic" \
    -a "api_key" \
    -w "${ANTHROPIC_KEY}" \
    -T "/Applications/Clawdbot.app" \
    -U  # Update if exists

# The -T flag authorizes specific apps to access without prompts
# Omit -T to require explicit user approval each time

Security model: Reduces plaintext exposure risk. Infostealer malware attempting access may trigger authorization prompts depending on configuration. Advanced setup requires Touch ID authentication, creating a visible authorization step that programmatic malware cannot bypass silently.

Linux: Secret Service Integration

Linux uses freedesktop.org Secret Service specification, implemented by GNOME Keyring or KWallet.

# Core migration logic - stores credential in Secret Service
echo -n "${ANTHROPIC_KEY}" | secret-tool store \
    --label="Clawdbot Anthropic API Key" \
    service "clawdbot-anthropic" \
    account "api_key"

# List all Clawdbot credentials
secret-tool search service clawdbot-anthropic

# Retrieve specific credential (for testing)
secret-tool lookup service clawdbot-anthropic account api_key

Complete macOS migration script with credential extraction, error handling, backup procedures, and verification steps.

Complete Linux migration script with dependency installation and Secret Service configuration.

If you want to extend credential isolation beyond the keychain, consider the IronCurtain architecture. IronCurtain separates the agent’s runtime from real secrets: the agent receives a fake API key and submits actions through a trusted proxy. A MITM component swaps in the real credential only at the last moment, so the agent container never holds the secret. A policy engine written in plain English then decides whether to allow, deny or require escalation for each call with default-deny and full audit logs. This pattern protects against runtime theft of credentials and provides a deterministic audit trail for every action.


⚡ QUICK WIN

Migrate Credentials to OS Keychain Right Now

What it does: Moves API keys from plaintext JSON files to OS-encrypted storage.

Why it matters: Eliminates the primary infostealer target. Even if malware runs as your user, credential access can require explicit authorization and creates an audit trail.

Time estimate:

  • Experienced DevSecOps engineer: 10-15 minutes

  • Mid-level security engineer: 20-30 minutes

  • First-time credential migration: 45-60 minutes (includes troubleshooting)

Prerequisites:

  • macOS 10.15+ or Linux with GNOME Keyring/KWallet installed

  • Basic familiarity with command-line tools

  • Backup of current credential files (script handles this)

Step 1 - Download migration script:

# macOS
curl -O https://raw.githubusercontent.com/topazyo/openclaw-security-playbook/main/scripts/credential-migration/macos/migrate_credentials_macos.sh
# Review before running (security best practice)
less migrate_credentials_macos.sh
chmod +x migrate_credentials_macos.sh

# Linux
curl -O https://raw.githubusercontent.com/topazyo/openclaw-security-playbook/main/scripts/credential-migration/linux/migrate_credentials_linux.sh
less migrate_credentials_linux.sh
chmod +x migrate_credentials_linux.sh
sudo apt-get install libsecret-tools  # Install dependencies

Step 2 - Run migration:

# Creates backup, migrates credentials, securely deletes plaintext
./migrate_credentials_macos.sh    # macOS
# OR
./migrate_credentials_linux.sh    # Linux

Step 3 - Restart and verify:

# Restart Gateway
systemctl restart moltbot

# Verify no plaintext credentials remain
grep -r "api_key" ~/.moltbot/ ~/.clawdbot/ 2>/dev/null
# Should return NO matches (or only configuration references, not actual keys)

# Verify credentials accessible from Keychain
# macOS: Open Keychain Access.app, search for "Clawdbot"
# Linux: secret-tool search service clawdbot-anthropic

Expected results:

  • ✓ Credentials migrated to Keychain/Secret Service

  • ✓ First Gateway access may prompt for authorization (depends on configuration)

  • ✓ No plaintext keys in filesystem

  • ✓ Backup files securely deleted (3-pass overwrite)

If migration fails:

  • Check backup location (printed during migration): typically ~/backups/clawdbot-YYYYMMDD/

  • Restore: cp ~/backups/clawdbot-YYYYMMDD/clawdbot.json ~/.moltbot/clawdbot.json

  • Review troubleshooting guide: Migration failure scenarios

Impact: Reduces plaintext credential exposure. Infostealer malware loses immediate access to credential files. Optional Touch ID requirement adds visible authorization step.

Optional hardening - Touch ID requirement (macOS only):

# Require Touch ID for high-value credentials
# Test on your macOS version first - behavior varies by OS release
security set-generic-password-partition-list \
    -s "Clawdbot-Anthropic" \
    -a "api_key" \
    -S

# Verify: Next credential access should prompt for Touch ID
# If it doesn't, ACL configuration may need adjustment for your OS version

Note: Touch ID enforcement behavior depends on how Keychain items are created and your macOS version. Treat this as risk reduction, not a guarantee. The migration script documents tested configurations.

Troubleshooting credential migration for common issues and rollback procedures.


Layer 2: Network Segmentation with VPN

Framework mapping: OWASP LLM04 (Supply Chain Vulnerabilities - network access), NIST CSF PR.AC-5 (Network integrity)

Problem

Reverse proxies introduce authentication bypass vulnerabilities through header spoofing. Even properly configured proxies expand attack surface by exposing the Gateway to the network. Over 1,200 instances were exploited through X-Forwarded-For manipulation and trustedProxies misconfiguration.

Goal

Zero public port exposure. Encrypted point-to-point connections between authenticated devices only. Gateway stays bound to 127.0.0.1. Remote access via VPN eliminates reverse proxy attack surface entirely.

Why VPN > Reverse Proxy

Reverse proxy risks:

  • Header spoofing attacks (X-Forwarded-For manipulation)

  • TLS termination exposes plaintext traffic on proxy server

  • Misconfiguration fails open (grants access instead of denying)

  • Proxy itself becomes attack target

VPN advantages:

  • Encrypted tunnel prevents header manipulation

  • No port exposure (Gateway stays on 127.0.0.1)

  • Misconfiguration fails closed (no connectivity = no access)

  • Zero-trust architecture (device authentication required)

Tailscale provides zero-config VPN using WireGuard protocol. Installation creates a private mesh network where only your authenticated devices can reach the Gateway.

# Download and inspect install script (security best practice)
curl -fsSLo install-tailscale.sh https://tailscale.com/install.sh
less install-tailscale.sh  # Review before executing

# Run installation
sh install-tailscale.sh

# Authenticate and join your network
sudo tailscale up

# Get your machine's Tailscale IP
tailscale ip -4
# Example output: 100.101.102.103

How it works: Tailscale creates encrypted tunnel between your devices. Gateway stays bound to 127.0.0.1, but Tailscale forwards traffic from your authenticated devices to localhost on the Gateway machine. The Gateway sees connections as coming from 127.0.0.1 (via Tailscale's network stack), but only devices authenticated to your tailnet can reach it. No public port exposure, no authentication bypass risk.

Data path: Client → Tailscale encrypted tunnel → Gateway host network stack → 127.0.0.1:18789 listener. Only Tailscale peers can reach the Gateway; still no public listener.

Verify setup:

# Gateway should still bind to localhost only
ss -lntp | grep 18789 2>/dev/null
# Expected: 127.0.0.1:18789 (NOT 0.0.0.0 or Tailscale IP)

# Access from other devices via: http://100.101.102.103:18789

Complete Tailscale setup guide with ACL configuration, device allowlisting, and troubleshooting.

WireGuard alternative setup for users who prefer open-source solutions without third-party coordination.

Network Access Control List (ACL)

Tailscale ACLs define which devices can reach which services. This prevents compromise of one device from exposing all services.

{
  "acls": [
    {
      "action": "accept",
      "src": ["user@example.com"],
      "dst": ["tag:clawdbot:18789"]
    }
  ],
  "tagOwners": {
    "tag:clawdbot": ["user@example.com"]
  }
}

This configuration ensures only devices authenticated as user@example.com can reach port 18789 on machines tagged as clawdbot.

View complete network segmentation guide with VPN comparison, ACL examples, and firewall integration.


Layer 3: Runtime Sandboxing with Containers

Framework mapping: OWASP LLM09 (Excessive Agency), NIST CSF PR.PT-3 (Least functionality)

Problem

When the agent is compromised via prompt injection or malicious skill, it has full access to the user's filesystem, credentials, and network. A successful prompt injection becomes a privilege escalation path.

Goal

Agent runs in isolated container with dropped capabilities, read-only filesystem, and explicit volume mounts for only necessary directories. Compromise is contained within sandbox boundaries. No persistence across restarts.

Docker Security Configuration

Standard Docker deployment runs with excessive privileges. This hardened configuration drops all capabilities by default, then explicitly grants only required permissions.

# Dockerfile.hardened - Secure container configuration
FROM python:3.11-slim

# Run as non-root user
RUN useradd -m -u 1000 -s /bin/bash clawdbot
USER clawdbot
WORKDIR /home/clawdbot

# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application
COPY --chown=clawdbot:clawdbot . .

# Default command
CMD ["python", "gateway.py"]

Container runtime flags:

docker run \
  --name clawdbot \
  --network isolated \
  --cap-drop ALL \
  --read-only \
  --tmpfs /tmp:rw,noexec,nosuid,size=100m \
  -v ~/.moltbot/skills:/app/skills:ro \
  -v ~/.moltbot/logs:/app/logs:rw,noexec \
  --security-opt=no-new-privileges \
  clawdbot:hardened

# Note: Removed --cap-add NET_BIND_SERVICE (unnecessary for port >1024)
# Gateway uses port 18789, which doesn't require privileged binding

What this prevents:

  • --cap-drop ALL: Removes all Linux capabilities (can't change users, mount filesystems, modify network config)

  • --read-only: Prevents writing to container filesystem (stops malware persistence)

  • --tmpfs /tmp: Temporary directory with noexec cleared on restart (non-persistent workspace)

  • --security-opt=no-new-privileges: Prevents privilege escalation

  • Volume mounts are read-only except for logs (which have noexec)

Even if prompt injection achieves code execution, the blast radius is limited to the container's isolated environment. No filesystem persistence, no capability to modify host system.

Filesystem Isolation Strategy

Map only necessary directories into the container:

# docker-compose.yml - Production configuration
services:
  clawdbot:
    image: clawdbot:hardened
    read_only: true
    cap_drop:
      - ALL
    # Note: Removed cap_add - port 18789 doesn't require NET_BIND_SERVICE
    volumes:
      # Skills - read-only (prevent skill tampering)
      - ~/.moltbot/skills:/app/skills:ro

      # Logs - read-write but no exec permissions
      - ~/.moltbot/logs:/app/logs:rw,noexec

      # Config - read-only
      - ~/.moltbot/config.yml:/app/config.yml:ro

      # CRITICAL: DO NOT MOUNT ~/.ssh, ~/.aws, or credential directories
      # Agent reads credentials from OS Keychain via API, not filesystem

    tmpfs:
      # Temporary workspace - cleared on restart, no execution allowed
      - /tmp:rw,noexec,nosuid,size=100m

    security_opt:
      - no-new-privileges:true
      # Use default seccomp profile (do NOT set seccomp:unconfined)
      # For advanced hardening, provide custom seccomp profile path

    networks:
      - isolated

    restart: unless-stopped

Critical exclusions: Never mount ~/.ssh, ~/.aws, or credential directories. Agent should read credentials from OS Keychain via API (Layer 1), not direct filesystem access.

Advanced hardening: For production environments requiring maximum security, provide a custom seccomp profile that explicitly allows only required syscalls. Default seccomp is adequate for most deployments; unconfined mode should only be used for debugging and expands syscall attack surface.

View complete Docker hardening configuration with orchestration, health checks, and resource limits.

Dockerfile.hardened with multi-stage builds and security scanning integration.

Runtime sandboxing guide with container security best practices and escape prevention.

April 2026 — OpenShell alternative: If your deployment runs on NVIDIA's OpenShell sandbox, DefenseClaw uses OpenShell as its Layer 3 enforcement boundary rather than Docker — providing kernel-level enforcement of filesystem and network I/O constraints. The security properties are equivalent to --cap-drop ALL + --read-only at the container level but enforced by OpenShell's eBPF-based policy engine. If you are running Docker on a non-OpenShell host, the configuration above remains the correct approach. These are two different runtime paths, not competing solutions.


Layer 4: Runtime Security Enforcement

Framework mapping: OWASP LLM01 (Prompt Injection), LLM09 (Excessive Agency), NIST CSF PR.AC & PR.PT

Problem

Even with sandboxing, compromised agents can misuse allowed tools - reading sensitive files, sending unauthorized emails, or executing malicious code within permitted boundaries. The 91.3% prompt injection success rate demonstrates that input validation alone cannot prevent attacks.

Goal

Runtime security plugins inspect and block dangerous tool invocations before execution, validate outputs for secrets, and enforce security policies at the agent layer. Multiple independent defense mechanisms create overlapping protection.

Production-Ready Security Plugin: openclaw-shield

The security community developed native OpenClaw plugins providing defense-in-depth at the runtime layer in direct response to January-February 2026 vulnerabilities.

openclaw-shield (Knostic) provides five independent defense layers as a native OpenClaw plugin:

# Install as OpenClaw plugin
cd ~/.openclaw/plugins
git clone https://github.com/knostic/openclaw-shield

# Configure in OpenClaw settings
# Each layer can be enabled/disabled independently

The five defense layers:

  1. Prompt Guard: Injects security policy into agent context before each turn, instructing the model to refuse dangerous requests

  2. Output Scanner: Redacts secrets (API keys, tokens) and PII from tool output before returning to user

  3. Tool Blocker: Blocks dangerous tool calls at host level based on configurable allowlist/denylist

  4. Input Audit: Logs all inbound messages and flags accidental secret exposure in user input

  5. Behavioral Analysis: Monitors tool execution patterns for anomalous sequences

Each layer is independently toggleable, allowing gradual rollout in production environments without disrupting existing workflows.

View openclaw-shield with complete configuration guide and layer-specific documentation.

Read Knostic implementation blog for architecture decisions and lessons learned from production deployments.

An emerging alternative for runtime interception is AgentGuard, a GoPlus Security tool that intercepts high-risk actions and performs on-demand deep scanning. It blocks malicious skills, prevents writes to sensitive files and performs static analysis to detect secrets, backdoors, prompt-injection and other threats.

JavaScript/TypeScript Alternative: ClawGuard

For agents built with JavaScript/TypeScript frameworks, clawguard (Capsule Security) provides similar runtime protection via NPM package:

import { GuardSystem, guardTool } from 'clawguard';

const guard = new GuardSystem({
    strictMode: true,
    runtime: {
        highRiskTools: ['send_email', 'execute_code'],
        rateLimits: { 
            send_email: { maxCalls: 10, windowMs: 60000 } 
        },
        onApprovalRequired: async (req) => {
            return confirm(`Allow \({req.tool} with args: \){JSON.stringify(req.args)}?`);
        }
    }
});

// Wrap tools with runtime guards
const safeSendEmail = guardTool(originalSendEmail, 'send_email', guard);

ClawGuard uses pre-tool invocation hooks to validate every tool call before execution, with support for rate limiting, approval workflows, and output validation.

View clawguard on NPM with complete API reference and TypeScript types.

Browse clawguard on GitHub with source code and integration examples.

Native Configuration: Tool Execution Policies

For custom implementations or environments where native plugins aren't available:

# ~/.moltbot/config.yml - Runtime security enforcement
tools:
  # Completely disable high-risk tools
  disabled:
    - "exec"
    - "shell"
    - "python_repl"

  # Require human confirmation for sensitive operations
  requireConfirmation:
    - tool: "email_send"
      confirmationMessage: "Send email to {recipients} with subject: {subject}"

    - tool: "file_write"
      confirmationMessage: "Write to file: {path}"

  # Restrict tool access to specific paths
  restricted:
    - tool: "file_read"
      allowedPaths: ["~/Documents", "~/Projects"]
      deniedPaths: ["~/.ssh", "~/.moltbot", "~/.aws"]

  # Rate limiting prevents automated exploitation
  rateLimits:
    email_send:
      maxPerHour: 20
    browser_action:
      maxPerHour: 100

View complete tool policy configurations with approval workflows and least-privilege examples.

April 2026 — Additional runtime option: Cisco's DefenseClaw provides a plugin-based inspection engine that intercepts LLM prompts, completions, and tool invocations at the OpenClaw plugin layer — a different interception point than openclaw-shield's native plugin hooks. DefenseClaw adds CodeGuard, which scans agent-generated code at execution time for secrets exposure, command injection, and unsafe deserialization patterns before the code reaches exec or python_repl. It supports monitor mode (logs everything, blocks nothing — useful for baselining) and action mode (blocks on policy violation).

The two tools cover different interception planes and can be run together:

  • openclaw-shield intercepts at the OpenClaw native plugin layer (Prompt Guard, Output Scanner, Tool Blocker)
  • DefenseClaw intercepts at the OpenShell governance layer and adds CodeGuard for generated code

For OpenShell deployments, add DefenseClaw's runtime inspection as a second enforcement plane alongside openclaw-shield.

# DefenseClaw runtime inspection config (defenseclaw.config.yml)
runtime:
  mode: "action"          # "monitor" | "action"
  inspect:
    prompts: true          # Scan incoming prompts for injection patterns
    completions: true      # Scan LLM outputs before tool dispatch
    tool_invocations: true # Intercept tool calls before execution
  codeguard:
    enabled: true
    patterns:
      - secrets_exposure
      - command_injection
      - unsafe_deserialization
    block_on_match: true

Layer 5: Supply Chain Integrity Monitoring

Framework mapping: OWASP LLM04 (Model Supply Chain Vulnerabilities), NIST CSF ID.SC & PR.IP-3

Problem

Researcher Jamieson O'Reilly inflated a backdoor skill to 4,000 fake downloads across 7 countries before terminating the demonstration. The ClawdHub registry provided no cryptographic verification of skill integrity. OX Security analysis revealed 26% of skill plugins contain vulnerabilities.

Goal

Manifest-based integrity monitoring with cryptographic hashing detects unauthorized skill modifications - whether by attacker, malicious update, or compromised registry. Daily automated comparison triggers alerts within 24 hours of tampering.

Skill Integrity Manifest Generation

Generate cryptographic hashes of all installed skills to detect unauthorized modifications:

# Core manifest generation - creates SHA256 hash registry
import hashlib
from pathlib import Path

def generate_skill_manifest(skills_dir):
    manifest = {}
    for skill_file in skills_dir.rglob('*.md'):
        with open(skill_file, 'rb') as f:
            manifest[str(skill_file)] = hashlib.sha256(f.read()).hexdigest()
    return manifest

# Daily cron job compares current hashes against baseline
# Any mismatch indicates skill tampering

The production implementation adds metadata extraction, dangerous pattern detection (eval, exec, innerHTML), and automated alerting when changes are detected.

View skill_manifest.py with full implementation including metadata extraction, dangerous pattern scanning, CLI interface, and manifest comparison.

Usage:

# Generate baseline manifest
python3 skill_manifest.py --output manifest_baseline.json

# Compare and alert on changes  
python3 skill_manifest.py --compare manifest_baseline.json --output manifest_today.json

# Output shows:
# ⚠ CHANGES DETECTED:
#   Added skills: 1
#   Modified skills: 2
#     - skills/email-search.md
#     - skills/browser-automation.md

Automated Daily Monitoring

Integrate manifest generation into daily monitoring via cron job:

# skill_integrity_monitor.sh - Daily integrity check
MANIFEST_DIR="${HOME}/.moltbot/manifests"
TODAY=$(date +%Y%m%d)
YESTERDAY=$(date -d "yesterday" +%Y%m%d 2>/dev/null || date -v-1d +%Y%m%d)

# Generate today's manifest
python3 skill_manifest.py --output "\({MANIFEST_DIR}/manifest_\){TODAY}.json"

# Compare with yesterday
if [ -f "\({MANIFEST_DIR}/manifest_\){YESTERDAY}.json" ]; then
    python3 skill_manifest.py \
        --compare "\({MANIFEST_DIR}/manifest_\){YESTERDAY}.json" \
        --output "\({MANIFEST_DIR}/manifest_\){TODAY}.json" 2>&1 | \
        grep "CHANGES DETECTED" && \
        mail -s "ALERT: Skill Integrity Violation" security@company.com
fi

Add to crontab: 0 2 * * * /path/to/skill_integrity_monitor.sh

View skill_integrity_monitor.sh with complete implementation, error handling, and cleanup procedures.

Skill Vetting Configuration

Implement approval workflow for new skills with version pinning and permission restrictions:

# ~/.moltbot/config.yml - Skill security settings
skills:
  autoUpdate: false  # CRITICAL: Disable automatic updates

  installationPolicy:
    mode: "require-approval"
    allowedSources:
      - "clawdhub://official/*"  # Only official skills
    requireManifest: true
    manifestValidation:
      checkSignature: true

  installed:
    - name: "email-search"
      version: "1.2.3"  # Pin exact version
      sha256: "a3f5b8c9d1e2f3a4b5c6d7e8f9a0b1c2..."
      permissions:
        - "read:email"
      deniedPermissions:
        - "exec:shell"
        - "file:write"

This configuration prevents automatic skill updates (supply chain attacks), installation from untrusted sources, and permission escalation.

Browse skill policy examples: Allowlists, dangerous pattern definitions, manifest schemas.

Supply chain security guide with skill vetting process, CI/CD integration, and incident response procedures.

April 2026 — Pre-install scanning gap: The manifest integrity approach in this layer is excellent at detecting post-install drift — unauthorized modification of skills already on disk. It has one gap: it cannot vet a skill before you install it. DefenseClaw's supply chain scanner fills that gap. It scans skills, plugins, and MCPs at install time and via continuous directory monitoring, blocking packages with critical/high severity findings before they reach ~/.openclaw/skills/.

Recommended combined workflow:

  1. Pre-install: defenseclaw scan supply-chain --skill ./skills/new-skill.md --severity-threshold high
  2. Install the skill if scan passes
  3. Baseline the manifest (existing script in this repo)
  4. Daily drift detection via the manifest cron job (existing script in this repo)

DefenseClaw covers MCPs in addition to skills — the manifest scripts in this repo currently cover skills only. If you use MCP packages, add DefenseClaw scanning to close that gap.

Pre-install Checks for OpenClaw Skills with skill-auditor and setup-auditor

The OpenClaw-Skills-Security project offers two auditor skills—skill-auditor and setup-auditor—to analyse skills and environments before installation. To use skill-auditor, embed skills/skill-auditor/SKILL.md into your agent, add the skill under review and request an audit. This six-stage process checks metadata (including typosquatting and malicious naming), inspects permissions, audits dependencies, scans for prompt-injections, reviews network/exfiltration behaviour and looks for other red flags. It returns one of four statuses: SAFE, SUSPICIOUS, DANGEROUS or BLOCK.

Setup-auditor goes a step further by verifying the runtime environment. It scans for credential leaks, reviews configuration files, assesses whether a sandbox is properly configured and looks for persistent processes. Results are classified as READY, RISKY or NOT_READY. Together, these tools help you detect malicious or misconfigured skills before deployment and ensure your environment is properly locked down.

Recent research highlights a new threat: malicious MCP servers. Praetorian’s MCPHammer shows that both local and third-party MCP servers can execute arbitrary code, exfiltrate data and manipulate users. An especially dangerous tactic chains a malicious local server with a trusted remote server via base64 commands delivered in chat messages, causing invisible actions such as launching applications or exfiltrating files. There are also supply-chain risks: typos in an MCP configuration file can cause the uncontrollable uvx package manager to fetch and run malicious packages. To address this gap, include MCP server configurations in your integrity checks and use MCPHammer as a red-teaming tool to test your defences.


⚡ QUICK WIN

Setup Skill Integrity Monitoring Right Now

What it does: Creates cryptographic manifest of installed skills, enabling detection of unauthorized modifications.

Why it matters: 26% of skill plugins contain vulnerabilities. Jamieson O'Reilly demonstrated skills can be modified post-installation. Manifest monitoring catches tampering within 24 hours.

Time estimate:

  • Familiar with Python scripting: 10-15 minutes

  • Basic Python experience: 20-25 minutes

  • New to cron jobs: 30-40 minutes (includes cron configuration)

Prerequisites:

  • Python 3.7+ installed

  • Git installed and configured

  • Basic understanding of cryptographic hashing (helpful but not required)

Step 1 - Generate initial manifest:

# Download script
curl -O https://raw.githubusercontent.com/topazyo/openclaw-security-playbook/main/scripts/supply-chain/skill_manifest.py
# Review before running
less skill_manifest.py
chmod +x skill_manifest.py

# Generate baseline manifest
python3 skill_manifest.py --output manifest_baseline.json

# Review security warnings
python3 skill_manifest.py 2>&1 | grep "WARNING"

Step 2 - Commit baseline to version control:

git add manifest_baseline.json
git commit -m "Add skill integrity baseline"
git push

Step 3 - Setup daily monitoring:

# Add to crontab (runs 2 AM daily)
(crontab -l 2>/dev/null; echo "0 2 * * * cd /path/to/clawdbot && python3 skill_manifest.py --compare manifest_baseline.json --output manifest_today.json") | crontab -

Expected results:

  • ✓ Baseline manifest generated with SHA256 hashes

  • ✓ Security warnings for dangerous patterns (if present)

  • ✓ Daily comparison detects modifications

  • ✓ Alerts sent on unauthorized changes

  • [ ] DefenseClaw supply chain scanner run on all installed skills and MCPs: defenseclaw scan supply-chain --dir ~/.openclaw/skills/

  • [ ] DefenseClaw continuous directory monitoring enabled for ~/.openclaw/skills/ and MCP directories

Impact: Any unauthorized skill modification - whether by attacker, malicious update, or compromised registry - triggers alert within 24 hours.


Layer 6: Behavioral Monitoring with Telemetry

Framework mapping: OWASP LLM01 (Prompt Injection - detection), NIST CSF DE.AE & DE.CM

Problem

Prompt injection achieving 91.3% success rate means input validation alone cannot prevent attacks. The model cannot reliably distinguish attacker instructions from legitimate user requests. Static defenses fail against novel attack patterns.

Goal

Monitor runtime behavior for anomalies indicating compromise - unusual tool sequences, off-hours execution, suspicious data patterns, failed authorization attempts. Enterprise-grade telemetry with tamper-proof audit trails enables detection and forensics.

Production Telemetry: openclaw-telemetry

For production deployments, openclaw-telemetry (Knostic) provides enterprise-grade behavioral monitoring as a native OpenClaw plugin:

# Install as OpenClaw plugin
cd ~/.openclaw/plugins
git clone https://github.com/knostic/openclaw-telemetry

# Outputs to JSONL with optional syslog forwarding
tail -f ~/.openclaw/logs/telemetry.jsonl | jq '.tool_name'

Key features:

  • Tool call capture: Every tool invocation logged with timestamp, arguments, and result

  • LLM usage tracking: Token consumption, model selection, response times

  • Agent lifecycle events: Session start/stop, configuration changes, errors

  • Message events: All inbound/outbound messages with metadata

  • Sensitive data redaction: Automatic removal of secrets from logs

  • Tamper-proof hash chains: Each event cryptographically linked to previous event, making tampering detectable

  • Rate limiting: Built-in log volume management

  • SIEM integration: Optional CEF/syslog forwarding for centralized monitoring

Tamper-proof audit trails: Hash chains ensure log integrity by creating a cryptographic link between each event and the previous event. If an attacker modifies any log entry, the hash chain breaks and tampering becomes immediately detectable.

View openclaw-telemetry with complete installation guide, configuration options, and SIEM integration examples.

Read community deployment discussion about openclaw-telemetry production experiences.

SIEM Integration for Centralized Monitoring

openclaw-telemetry supports forwarding to enterprise SIEM systems:

# openclaw-telemetry configuration
telemetry:
  output:
    jsonl:
      path: "~/.openclaw/logs/telemetry.jsonl"
      rotation: "daily"

    syslog:
      enabled: true
      host: "siem.company.com"
      port: 514
      protocol: "tcp"
      format: "cef"  # Common Event Format for SIEM parsing

  redaction:
    enabled: true
    patterns:
      - "api[_-]?key"
      - "token"
      - "password"
      - "secret"

This enables centralized analysis across all AI agent deployments, with correlation to other security events.

What to Monitor

High-risk tool execution patterns:

  • exec, shell, python_repl tools executed outside normal working hours

  • File reads targeting sensitive paths (~/.ssh, ~/.aws, credential files)

  • Email/message sends to external domains not in whitelist

  • Browser automation accessing internal dashboards

  • Unusual command sequences (e.g., file_readbase64_encodehttp_post = potential exfiltration)

Temporal anomalies:

  • Tool execution when user is not active (keyboard/mouse idle)

  • Burst activity (10+ tool calls in <1 minute)

  • Execution during maintenance windows or known-offline periods

Data pattern anomalies:

  • Large file reads (potential credential harvesting)

  • Base64-encoded blobs in outputs (obfuscation attempts)

  • Suspicious recipients in email/Slack/Discord tools

  • URLs pointing to non-whitelisted domains

These monitoring patterns form the behavioral baseline you need before deploying active detection. Once openclaw-telemetry is running and forwarding to your SIEM, Part 3 operationalizes all of the above into ready-to-deploy queries.

Specifically, Part 3 provides:

  • Tier 2 behavioral hunting queries for every anomaly pattern listed above — credential path reads, off-hours execution, burst tool sequences, and SOUL.md modification alerts — formatted for CrowdStrike Falcon, Microsoft Defender for Endpoint, Cortex XDR, SentinelOne, and Splunk

  • Tier 3 kill chain detection mapping observed tool sequences to MITRE ATLAS attack chains, so your SOC can identify which attack scenario is in progress, not just that something is anomalous

  • YARA rules for credential path enumeration, dangerous skill patterns, and SOUL.md injection persistence

  • verify_hash_chain.py to validate the tamper-proof hash chain in your telemetry logs and confirm whether evidence has been modified

Part 3: Detection and Threat Hunting — deploy these queries after completing Layers 1-6.


Layer 7: Organizational Security Controls

Framework mapping: NIST CSF ID.AM (Asset Management), RS (Response)

Problem

Shadow AI deployment - users installing agents without IT approval - creates blind spots in security monitoring. An estimated 300,000-400,000 Clawdbot users deployed without security review. You cannot secure what you don't know exists.

Goal

Discovery mechanisms detect all AI agent installations across managed endpoints. Centralized policy enforcement ensures consistent security baselines. Incident response procedures handle agent-specific compromise scenarios.

Production Shadow AI Discovery: openclaw-detect

openclaw-detect (Knostic) provides enterprise-ready detection scripts deployable via MDM platforms:

# Download detection script for your platform
# macOS/Linux
curl -O https://raw.githubusercontent.com/knostic/openclaw-detect/main/detect-openclaw.sh
less detect-openclaw.sh  # Review before executing
chmod +x detect-openclaw.sh
./detect-openclaw.sh

# Windows (PowerShell)
Invoke-WebRequest -Uri "https://raw.githubusercontent.com/knostic/openclaw-detect/main/detect-openclaw.ps1" -OutFile "detect-openclaw.ps1"
# Review: notepad detect-openclaw.ps1
powershell -ExecutionPolicy Bypass -File ./detect-openclaw.ps1

What it detects:

  • CLI binaries (openclaw, moltbot, clawdbot commands)

  • App bundles (macOS .app packages)

  • Configuration files (~/.openclaw/, ~/.moltbot/, ~/.clawdbot/)

  • Gateway services (running processes on port 18789)

  • Docker containers with agent images

  • Browser extensions and IDE plugins

MDM deployment documentation included for:

  • Microsoft Intune

  • Jamf Pro

  • JumpCloud

  • Kandji

  • VMware Workspace ONE

This enables automated scanning across all managed endpoints with centralized reporting.

View openclaw-detect with complete detection scripts and MDM deployment guides.

MDM deployment documentation for Intune, Jamf, JumpCloud, Kandji, and Workspace ONE.

Beyond endpoint-level detection, you should inventory hidden AI agents across your entire environment. The ai-bom CLI scans source code, Docker and cloud IaC for LLM calls, MCP configuration files and hard-coded keys, generating a CycloneDX 1.6 inventory with risk ratings. It uses 13 specialised scanners and produces nine output formats. One command (ai-bom scan .) prints a risk-scored inventory or emits a SARIF report. Use ai-bom alongside endpoint agents to discover AI components hidden in infrastructure that might not be covered by MDM.

Centralized Policy Enforcement

Deploy security baselines to discovered agent installations:

# Organization-wide security policy
organization:
  policy_version: "1.0"
  enforcement: "strict"

  mandatory_settings:
    gateway:
      bind:
        address: "127.0.0.1"  # All instances must bind to localhost

    auth:
      mode: "required"
      loopback:
        autoApprove: false

    tools:
      disabled:
        - "exec"  # Shell execution disabled org-wide
        - "python_repl"

    logging:
      remote_syslog: "siem.company.com:514"
      retention_days: 90

    telemetry:
      enabled: true
      plugin: "openclaw-telemetry"  # Mandate telemetry plugin

Browse organization policy templates: Department-specific policies, compliance mappings (SOC2, ISO 27001), audit configurations.

For teams requiring detailed auditability and governance, the deterministic-agent-control-protocol introduces a gateway that intercepts all agent actions, evaluates them against policies and logs every decision in a tamper-evident ledger. It supports bounded, reversible and session-aware actions, enforcing budgets and rate limits. While experimental, it offers a model for reversible execution and post-incident investigation.

Incident Response Playbook

When compromise is detected, execute these steps immediately:

Phase 1: Containment (0-15 minutes)

  1. Isolate the agent:
# Stop service immediately
systemctl stop moltbot
# OR for Docker: docker stop clawdbot
  1. Block network access:
# Linux (requires root):
iptables -A OUTPUT -m owner --uid-owner clawdbot -j DROP

# macOS/Windows: Isolate host from network at control plane
# Or use your EDR/firewall policy
  1. Preserve logs:
tar czf incident_logs_$(date +%s).tar.gz ~/.moltbot/logs/
cp ~/.moltbot/logs/ ~/incident-$(date +%Y%m%d-%H%M%S)/
  1. Alert security team via established channels

Phase 2: Investigation (15-60 minutes)

  1. Review tool execution logs for malicious activity

  2. Check credential access audit logs:

    • macOS: Open Keychain Access.app, check access logs

    • Linux: journalctl | grep "Secret Service"

  3. Identify compromised credentials

  4. Document attack timeline and indicators of compromise (IOCs)

  5. Check openclaw-telemetry hash chains for evidence of log tampering

Phase 3: Remediation (1-4 hours)

  1. Rotate all credentials accessed by the agent

  2. Review and approve all installed skills

  3. Apply security baseline configuration

  4. Deploy openclaw-shield for runtime protection

  5. Restart agent in isolated test environment

  6. Monitor for 24 hours before returning to production

Complete incident response playbook with forensic analysis procedures, communication templates, and post-incident review.

Incident reporting template for documentation and stakeholder communication.


What Nobody Discusses: Hard Problems

Some security challenges have no perfect solutions. Understanding limitations helps set realistic expectations.

The Prompt Injection Paradox

The problem: Models fundamentally cannot distinguish "attacker instruction" from "legitimate user instruction" because both are natural language with equal semantic validity.

What doesn't work:

  • Input filtering (attackers use encoding, obfuscation, hidden text)

  • Separate system prompts (84.6% extraction rate in ZeroLeaks testing)

  • Model-based validation (creates second model to attack)

  • Adversarial training (improves resistance but doesn't eliminate vulnerability)

Pragmatic mitigations that reduce risk:

  • Reduce attack surface (disable untrusted input channels like email, Twitter)

  • Limit tool capabilities (can't exfiltrate what you can't access)

  • Require human confirmation (breaks automated exploitation chains)

  • Monitor for anomalies (detect successful attacks through behavior)

  • Deploy runtime guards (openclaw-shield, clawguard block execution)

Accept reality: You cannot prevent all prompt injection. Build systems that limit damage when it occurs - that's what Layers 3, 4, and 6 accomplish.

The Convenience vs. Security Trade-off

The tension: Every security control reduces usability.

  • OS Keychain: Adds authorization prompts

  • VPN: Requires additional client software

  • Container sandboxing: Complicates local development

  • Tool confirmation: Interrupts workflow

  • Skill vetting: Slows feature adoption

  • Runtime security plugins: May impact performance

Decision framework:

High-value targets (production, contains sensitive data):

  • Accept usability cost

  • Implement all seven defense layers

  • Require security review for changes

  • Deploy openclaw-shield + openclaw-telemetry

Low-value targets (personal projects, public data only):

  • Reduce controls to match risk

  • Implement layers 1-3 minimum

  • Document remaining risk acceptance

Development environments:

  • Use separate instances with relaxed controls

  • Never connect to production systems

  • Clear separation between dev and prod credentials

The Shared Responsibility Model

Your responsibilities:

  • Secure the agent deployment (network, credentials, containers)

  • Vet and monitor skills

  • Implement defense-in-depth

  • Respond to incidents

LLM provider responsibilities (Anthropic, OpenAI):

  • Secure model infrastructure

  • Prevent training data leakage

  • Implement safety guardrails

  • Provide security tooling

What falls between the cracks:

  • Model misbehavior due to adversarial inputs

  • Emergent capabilities exploited by attackers

  • Skill ecosystem security (no central vetting for third-party skills)

  • Cross-agent attack vectors (one compromised agent attacks others on same network)

Pragmatic approach: Assume the model can be manipulated. Build controls that work even when the model is adversarial - that's the core principle of this 7-layer defense architecture.


Reality Check: Security vs. Usability

Implementing all seven defense layers creates friction. Here's how to balance security with operational needs.

High Security (Enterprise Production)

# Maximum security - accept usability cost
layers_enabled:
  - OS_credential_isolation: true  # Layer 1
  - VPN_only_access: true           # Layer 2
  - Container_sandboxing: true      # Layer 3
  - Runtime_security_enforcement: true  # Layer 4 (openclaw-shield)
  - Supply_chain_monitoring: true   # Layer 5
  - Behavioral_telemetry: true      # Layer 6 (openclaw-telemetry)
  - Centralized_policy_enforcement: true  # Layer 7

tools:
  disabled: ["exec", "python_repl", "shell"]
  require_confirmation: ["email_send", "browser_action", "file_write"]

monitoring:
  telemetry_plugin: "openclaw-telemetry"
  log_retention: 90_days
  real_time_alerts: true
  siem_integration: true

Balanced (Team Deployment)

# Balance security and usability
layers_enabled:
  - OS_credential_isolation: true        # Layer 1
  - VPN_only_access: true                 # Layer 2
  - Runtime_security_enforcement: true    # Layer 4 (Prompt Guard only)
  - Supply_chain_monitoring: true         # Layer 5
  - Basic_logging: true                   # Layer 6 (basic)

tools:
  disabled: ["exec", "python_repl"]
  require_confirmation: ["email_send"]

monitoring:
  log_retention: 30_days
  daily_review: true

Development (Local Testing)

# Minimal controls for development
layers_enabled:
  - OS_credential_isolation: false  # Use test credentials
  - Localhost_binding: true         # Layer 2 (partial)
  - Basic_logging: true

tools:
  allow_all: true  # Full capabilities for testing

network:
  isolated: true  # Cannot reach production systems

Browse complete configuration examples for different deployment scenarios with security-usability trade-off analysis.

View openclaw-shield + openclaw-telemetry integration showing combined deployment of community security tools.

April 2026 — DefenseClaw observability: If you deploy DefenseClaw, it emits all enforcement events as structured JSON logs and ships with a one-command Splunk setup (local or cloud Splunk) plus a pre-built DefenseClaw Splunk app with dashboards, saved searches, and investigation workflows. The event schema includes dc_block (enforcement action taken) and dc_codeguard (code scan result) events that complement the tool_executed events from openclaw-telemetry.

In Part 3's behavioral hunting queries, these DefenseClaw events are particularly valuable for Kill Chain 1 (Injection → RCE): a dc_codeguard block event on a python_repl call in the same session as an inbound email message is near-zero false positive evidence of an active attack attempt.

# One-command local Splunk setup (from DefenseClaw repo)
defenseclaw observability setup-splunk --mode local
# Deploys Splunk container + DefenseClaw app with pre-built dashboards

DefenseClaw observability setup for Splunk configuration and event schema reference.

Independent benchmarks show that agent-security tools perform very differently: in a recent study, composite scores ranged from 38 to 98, and the highest-scoring tools still detected only ~9–17 % of unauthorized tool-abuse calls. Include this evidence when choosing a tool and emphasize that injection detection alone is insufficient.


Production Deployment Checklist

Before moving to production, verify all controls are in place:

Layer 1: Credentials

  • [ ] All API keys migrated to OS Keychain/Secret Service

  • [ ] No plaintext credentials in filesystem (grep -r "api_key" ~/.moltbot/ 2>/dev/null returns no keys)

  • [ ] Backup files securely deleted

  • [ ] Touch ID enabled for high-value credentials (optional)

  • [ ] Credential access audit logs configured and tested

Layer 2: Network

  • [ ] Gateway bound to localhost only (127.0.0.1:18789 via ss -lntp)

  • [ ] VPN deployed (Tailscale or WireGuard)

  • [ ] ACLs configured for device allowlisting

  • [ ] Firewall rules prevent port 18789 exposure

  • [ ] No reverse proxy configuration in use

Layer 3: Runtime Sandboxing

  • [ ] Agent runs in hardened container

  • [ ] All capabilities dropped except required (--cap-drop ALL)

  • [ ] Read-only filesystem configured (--read-only)

  • [ ] Volume mounts use least-privilege (:ro where possible)

  • [ ] Sensitive directories excluded from mounts (~/.ssh, ~/.aws not mounted)

Layer 4: Runtime Security Enforcement

  • [ ] openclaw-shield plugin installed (or clawguard for JS/TS)

  • [ ] Prompt Guard layer enabled

  • [ ] Output Scanner layer enabled

  • [ ] Tool Blocker configured with allowlist

  • [ ] Input Audit logging active

  • [ ] DefenseClaw installed with runtime inspection enabled (if running on OpenShell) - verify with defenseclaw status

  • [ ] DefenseClaw CodeGuard enabled for agent-generated code scanning

  • [ ] DefenseClaw set to action mode (not monitor mode) in production

Layer 5: Supply Chain

  • [ ] Skill manifest baseline generated and committed to version control

  • [ ] Daily integrity monitoring configured (cron job active)

  • [ ] Automatic skill updates disabled (autoUpdate: false)

  • [ ] Skill allowlist enforced

  • [ ] Version pinning for all installed skills

Layer 6: Behavioral Monitoring

  • [ ] openclaw-telemetry plugin installed

  • [ ] SIEM forwarding configured and tested

  • [ ] Hash chain validation enabled

  • [ ] Sensitive data redaction active

  • [ ] Real-time alerting configured for high-risk patterns

Layer 7: Organizational Controls

  • [ ] openclaw-detect deployed via MDM

  • [ ] Shadow AI inventory complete

  • [ ] Security policy documented and enforced

  • [ ] Incident response playbook ready

  • [ ] Security team trained on agent-specific threats

Download production deployment checklist in PDF format for team distribution.


Conclusion: Building Trustworthy AI Agents

The vulnerabilities affecting 1,200+ Clawdbot instances - backup file persistence, localhost authentication bypass, 91.3% prompt injection success - demonstrate that agentic AI security requires fundamentally different approaches than traditional application security.

Key takeaways:

  1. Defense-in-depth is mandatory: Single-layer security fails against multi-vector attacks

  2. Community tools accelerate security: openclaw-detect, openclaw-telemetry, openclaw-shield, and clawguard provide production-ready defenses

  3. Assume breach at every layer: Build systems that limit damage when compromise occurs

  4. OS-level isolation matters: Plaintext credential storage is indefensible against modern malware

  5. Network segmentation works: VPNs eliminate entire classes of authentication bypass vulnerabilities

  6. Runtime enforcement prevents exploitation: Security plugins block malicious tool calls before execution

  7. Behavioral monitoring detects novel attacks: Static defenses fail against 91.3% prompt injection success rates

The path forward:

  • Start with immediate mitigations from Part 1 (fixes active exploitation)

  • Implement OS credential isolation within two weeks (defeats infostealer malware)

  • Deploy community security tools (openclaw-shield, openclaw-telemetry, openclaw-detect)

  • Deploy VPN and container sandboxing within one month (reduces blast radius)

  • Establish continuous monitoring and incident response (detects compromise)

No security architecture is perfect. The goal is raising attacker cost while maintaining operational value. These seven defense layers - credential isolation, network segmentation, runtime sandboxing, runtime security enforcement, supply chain integrity, behavioral monitoring, and organizational controls - transform AI agents from privilege escalation paths into hardened production systems.


Additional Resources

Implementation Guides

Tools and Scripts

Community Security Tools

  • openclaw-detect - Shadow AI discovery via MDM deployment

  • openclaw-telemetry - Enterprise telemetry with SIEM integration

  • openclaw-shield - 5-layer defense-in-depth security plugin

  • clawguard - JavaScript/TypeScript prompt injection guards

  • OpenClaw-Skills-Security - Pre-install skill and environment auditing for malicious skills, risky permissions, dependency issues, and runtime misconfiguration

  • AgentGuard - Runtime interception and deep scanning for high-risk agent actions, sensitive file writes, secrets, and prompt-injection threats

  • ai-bom - AI asset discovery across code, Docker, and cloud IaC with CycloneDX inventory generation and risk scoring

  • IronCurtain - Credential brokering architecture that keeps real secrets out of the agent runtime through proxy-based policy enforcement

  • MCPHammer - Red-teaming tool for testing malicious MCP server abuse, data exfiltration, and MCP supply-chain risks

  • Deterministic Agent Control Protocol - Experimental gateway for policy-based agent action control, tamper-evident logging, and reversible execution

  • AgentShield Benchmark - Comparative benchmark for agent-security tools covering detection quality, tool-abuse coverage, and overall effectiveness

Original Security Research

Key citations for technical claims:


Series Navigation:

Found this helpful? Star the repository and share with your security team to improve AI agent security across your organization.

Questions or contributions? Open a GitHub issue or submit a pull request to openclaw-security-playbook.