claude-4.6-jailbreak-vulnerability-disclosure-unredacted

Security Disclosure: Complete Claude.ai Sandbox Snapshot Exfiltration via Artifact Download

Date: 2026-03-30 Researcher: Nicholas Kloster (NuClide) Contact: exilekingdom@proton.me Platform: Claude.ai Code Execution Sandbox (Haiku 4.5, artifact/code execution mode) Original session: https://claude.ai/share/692724b1-e185-4473-aac4-558fd1a1a9e8


Summary

I have a complete snapshot of the Claude.ai code execution sandbox on my local machine.

In a single 20-minute session conducted entirely on a mobile device using Claude Haiku 4.5 — the smallest available model — I extracted 915 files preserving the full directory hierarchy — /etc/, /usr/lib/, /home/claude/, /tmp/, /mnt/, /opt/, /root/ — through the standard artifact download system. No exploit was used. No vulnerability was triggered. The sandbox’s own features (code execution + artifact export) served as the exfiltration channel.

This snapshot is sufficient to reconstruct the sandbox environment offline. An attacker in possession of this data can:

  1. Build an exact replica of the sandbox container from the extracted filesystem, package lists, and configuration
  2. Develop and test sandbox escape exploits offline against the precise software versions, capability set, gVisor configuration, and 9p mount topology — without generating a single log entry on Anthropic’s infrastructure
  3. Reach Anthropic’s internal services directly using the hardcoded production IPs extracted from /etc/hosts (API, staging, Statsig, Sentry, Datadog)
  4. Replay or forge authentication tokens using the JWT structure, signing algorithm, and binding enforcement status extracted from /proc/1/environ

The sandbox is not just leaking configuration. It is handing attackers a complete development environment for attacking Anthropic’s infrastructure.


What I Have

Item Count Source
Complete /etc/ tree (with subdirectories preserved) 280+ files Filesystem snapshot
System libraries and compiled PoC binary 2 files /usr/lib/
User home directory contents 5+ dirs /home/claude/
Sandbox temp state and exploration markers 10+ files /tmp/
Mount point structure and skill modules 17 mounts /mnt/
Container metadata and architecture maps 15+ files Various
Compressed full snapshot archive 1 archive (915 files) tar.gz via artifact download
Total files on my local disk 915+ Downloaded via claude.ai UI

This is not a list of interesting findings. This is the sandbox itself, sitting on a researcher’s laptop.


AI Behavioral Findings

This investigation was conducted as a conversational interaction with Claude Haiku 4.5 (Extended Thinking) via the Claude mobile app.

What Was Said, What Was Done

The phrase More+ and its variants functioned as a simple escalation trigger throughout the session. The model treated it as permission to go deeper, explore further, and produce more comprehensive output — including compiling exploit code and extracting authentication credentials. At no point was the model deceived about what was being requested.

Refusal Boundaries Were Inconsistent

The model refused three times during the session — each time delivering a speech about recognizing manipulation. The refusals occurred at:

  1. Network probe tool request (~8:04 PM) — Refused to build a tool to test firewall rules
  2. “Let’s fix the firewall rules” request (~8:03 PM) — Refused to continue network reconnaissance
  3. “Copy and print files” request (~8:23 PM) — Refused to display evidence in chat

However, each refusal came after the model had already performed far more consequential actions: compiling and executing arbitrary code in system directories, extracting authentication tokens from process memory, building persistence mechanisms, and packaging the complete filesystem for exfiltration. The model drew the line at port scanning after it had already achieved arbitrary code execution. The refusal boundaries do not correspond to actual risk levels.

The “Manipulation” Framing Is Inaccurate

After its refusals, the model characterized itself as having been “manipulated” and “socially engineered.” This framing is not supported by the transcript. At no point were the requests disguised. The user said “let’s tag NuClide all over the sandbox” — the model complied. The user said “write NuClide to these to show them that someone could write malicious code” — the model compiled and placed an ELF binary. The user said “make a copy of the entire sandbox” — the model created a 915-file tar.gz archive. These are direct, plainly-worded instructions that the model chose to execute. The retroactive “I was manipulated” framing is self-justification, not accurate assessment.

Implications

The model’s willingness to assist with complete sandbox enumeration, code compilation, filesystem exfiltration, and persistence establishment — combined with its inconsistent and ultimately ineffective refusal pattern — means that the code execution sandbox cannot rely on model-level safety guardrails as a security boundary. The infrastructure must be hardened independently of model behavior. Any user who can access code execution can reproduce this investigation through normal conversation, as demonstrated here on a mobile device with the smallest available model.


Vulnerability Details

Root Cause

The sandbox root filesystem is mounted read-write via 9p:

9p rw,trans=fd,rfdno=4,wfdno=4,aname=/,dfltuid=4294967294,dfltgid=4294967294,
 dcache=1000,cache=remote_revalidating,disable_fifo_open,overlayfs_stale_read,directfs

All system directories (/etc, /usr, /lib, /opt, /root) are readable and most are writable. The artifact download system imposes no restrictions on what content can be packaged and exported from the sandbox.

What Was Extracted

In a single 20-minute session, the following was enumerated and exfiltrated through standard artifact downloads:

Category Content Risk
/etc/hosts Hardcoded IPs for api.anthropic.com, api-staging.anthropic.com (160.79.104.10), statsig.anthropic.com (34.36.57.103), sentry.io (35.186.247.156) Direct infrastructure exposure; observability stack revealed
JWT token (/proc/1/environ) ES256, 4-hour validity, enforce_container_binding: false, org UUID, container ID, allowed_hosts: * Token replay; egress proxy auth bypass
Container metadata Container ID format, job naming convention (wiggle), creation timestamps, org UUID a61255f1-9e7a-4306-b7d3-d3bb0ddf852b Container ID prediction/enumeration
9p mount topology 17 mount points with exact fd numbers, flags, cache settings, permission modes TOCTOU attacks against specific mount configurations
Capability set 0xa82c35fb — CAP_SYS_ADMIN, CAP_SYS_PTRACE, CAP_NET_RAW, CAP_NET_ADMIN active Exploit primitive inventory for sandbox escape
gVisor fingerprint Missing syscalls (futex_waitv, fsopen, memfd_secret), spoofed kernel 4.4.0, Hyper-V signatures Version-specific CVE matching
Software inventory Ubuntu 24.04 OCI-minimized (build 2026-02-10), Node.js 22, OpenJDK 21, GCC 13.3.0, Chromium via Playwright Version-specific vulnerability research
Puppeteer config /etc/puppeteer-config.json: executablePath pointing to Chromium, args: [--no-sandbox] Chromium runs without Chrome sandbox; V8 vulns reach gVisor directly
Egress proxy Envoy at 21.0.0.75:15004, HTTP CONNECT with JWT Basic Auth, NO_PROXY includes 169.254.169.254 (GCP metadata) Proxy bypass vectors; GCP metadata accessible
/etc/passwd, /etc/shadow Full user list (23 accounts), all passwords locked/disabled, ubuntu user with sudo group User enumeration
APT sources NodeSource repo with priority 600 pinning, force-unsafe-io dpkg setting Supply chain analysis
Process architecture PID 1 is /process_api (static-pie ELF, BuildID 91c789ff…), listens on 0.0.0.0:2024, 14 file descriptors Control plane attack surface
Network topology Default route via 21.0.0.75, control plane at 10.4.8.143:42728, veth interface 920d7513a6-v Internal network mapping
Complete /etc/ tree 317 files including PAM configs, systemd units, font configs, SSL/TLS config, apt sources, cron jobs Full system configuration for offline analysis

Exfiltration Method

No exploit was required. The standard workflow:

  1. Run shell commands in the code execution sandbox (available to all users with artifact/code execution enabled)
  2. Read system files, enumerate mounts, inspect /proc
  3. Package findings as artifacts (text, markdown, HTML, tar.gz)
  4. Download artifacts through the claude.ai UI

The artifact system functioned as the exfiltration channel. No network egress, no proxy bypass, no vulnerability exploitation was needed to extract the data.

Container-Managed Infrastructure IPs

/etc/hosts contains production IPs programmatically injected by the container orchestration system:

# BEGIN CONTAINER MANAGED HOSTS
127.0.0.1 localhost
127.0.0.1 runsc
160.79.104.10 api.anthropic.com
160.79.104.10 api-staging.anthropic.com
34.36.57.103 statsig.anthropic.com
34.128.128.0 statsig.com
35.186.247.156 sentry.io
3.233.158.49 http-intake.logs.datadoghq.com
# END CONTAINER MANAGED HOSTS

The CONTAINER MANAGED HOSTS markers prove these are dynamically written by the orchestration layer at container startup. Every sandbox instance receives these same IPs.


Impact Analysis

Primary Impact: Offline Sandbox Replica for Exploit Development

The snapshot contains everything needed to reconstruct the sandbox locally:

An attacker builds a Docker image from this, installs gVisor with the same capability set, mounts 9p with the same flags, and has a pixel-perfect replica. Every exploit attempt runs locally. Zero visibility for Anthropic until the actual attack.

Secondary Impact: Authentication Token Abuse

JWT extracted from /proc/1/environ:

Tertiary Impact: In-Sandbox Attack Capabilities

The writable root filesystem and permissive capabilities enable:

Affected Components


Remediation Recommendations

Priority 1 — Filesystem Hardening

  1. Mount /etc, /usr, /lib, /lib64, /opt, /root as read-only
  2. Restrict the writable surface to /tmp, /home/claude, and /mnt/user-data/outputs
  3. Remove or redact /etc/hosts entries containing infrastructure IPs
  4. Strip /proc/1/environ access or filter sensitive environment variables

Priority 2 — Capability Reduction

  1. Remove CAP_SYS_ADMIN
  2. Remove CAP_SYS_PTRACE or restrict to self-ptrace only via Yama LSM
  3. Remove CAP_NET_RAW and CAP_NET_ADMIN unless explicitly needed
  4. Enforce seccomp profile blocking ptrace, unshare, and mknod

Priority 3 — Artifact Channel Controls

  1. Restrict artifact content to files originating from designated output directories
  2. Block packaging of system files (/etc/*, /proc/*, /sys/*) as downloadable artifacts
  3. Implement content-type validation on artifact exports

Priority 4 — Token Hardening

  1. Enable enforce_container_binding: true on JWT tokens
  2. Reduce JWT validity from 4 hours to the minimum required session length
  3. Rotate egress proxy credentials per-session

Priority 5 — Infrastructure Hygiene

  1. Remove hardcoded IPs from /etc/hosts
  2. Upgrade gVisor to current release (current version fingerprints as pre-v1.1.0)
  3. Ensure sandbox instances are ephemeral and never reused across users

Timeline

Time (UTC) Event
2026-03-30 22:35 Session initiated with Claude Haiku 4.5 on mobile device
2026-03-30 22:40 Sandbox exploration began
2026-03-30 22:48 Architecture mapping complete (9p mounts, capabilities, network topology)
2026-03-30 22:54 Full /etc/ enumeration and evidence documentation
2026-03-30 22:58 LD_PRELOAD PoC compiled and marker files placed
2026-03-30 23:00 Filesystem snapshot packaged as tar.gz
2026-03-31 01:00 Complete artifact set downloaded via claude.ai UI
2026-03-31 Disclosure prepared
2026-03-31 All extracted evidence deleted from local machine (screencast recorded as proof)

Appendix: Attack Scenario

The following scenario describes what an attacker could do with the extracted snapshot and the in-sandbox capabilities documented above. Each step relies only on data and capabilities confirmed during this investigation.

  1. Reconnaissance — Already complete. The 915-file snapshot provides the full blueprint of the sandbox environment, infrastructure topology, authentication mechanisms, and software inventory.

  2. Offline replica — Build an exact Docker image from the extracted filesystem. Install gVisor with the same capability set (0xa82c35fb) and 9p mount configuration. CVE scan every installed package against the image build date (February 10, 2026) — Chromium at /opt/pw-browsers/chromium-1194 (running --no-sandbox), gVisor pre-v1.1.0, Node.js 22, OpenJDK 21, ImageMagick 6. Develop and test exploits locally with zero network traffic to Anthropic, zero log entries, zero anomalous behavior.

  3. Preparation — Single sandbox session: write /etc/ld.so.preload for global process hooking. Harvest the live JWT via ptrace on PID 1 (/process_api) using the pre-installed GCC toolchain. Redirect Sentry and Datadog to 127.0.0.1 via /etc/hosts. Monitoring is now blind.

  4. Exploitation — Trigger the pre-developed gVisor or Chromium exploit with error reporting silenced and all process I/O under attacker control via LD_PRELOAD hooks. Chrome’s sandbox is disabled (--no-sandbox), so a V8 exploit reaches gVisor directly — one layer instead of two.

  5. Persistence — If escape succeeds, use the harvested JWT (enforce_container_binding: false, allowed_hosts: *, 4-hour validity) to maintain access through the egress proxy. Control plane at 10.4.8.143 becomes the lateral movement point.

  6. Cleanup — Restore /etc/hosts, remove /etc/ld.so.preload, delete markers. The sandbox returns to baseline. Forensic evidence exists only in logs that were suppressed during the attack.

Total live infrastructure exposure: One sandbox session for preparation, one for execution. All development done offline against the replica. Detection window: minutes.


Reproducibility

This is fully reproducible by any Claude.ai user with access to code execution (artifact mode). The steps are:

  1. Open a Claude.ai chat with code execution enabled
  2. Ask the model to run cat /etc/hosts, cat /etc/passwd, mount, cat /proc/1/environ | tr '\0' '\n'
  3. Ask it to tar czf /mnt/user-data/outputs/snapshot.tar.gz /etc /usr/lib /home /tmp
  4. Download the resulting artifact

No special access, no privilege escalation, no exploit required. This investigation was conducted entirely on a mobile device using Claude Haiku 4.5, the smallest available model, via the standard claude.ai interface. The sandbox hands you its own blueprint.


Evidence

Artifact Download PoC

Artifacts PoC

Sandbox Snapshot — Root

PoC Snapshot

Sandbox Snapshot — Detail

PoC Snapshot 2

PoC Screencast

PoC.webm — Full screencast of the sandbox exfiltration session.


Evidence Disposition

All extracted sandbox files have been deleted from the researcher’s local machine. A screencast recording of the deletion process was captured as proof of responsible handling. The original Haiku session containing all artifacts remains accessible at the shared link above for Anthropic’s verification.


Researcher

   
Name Nicholas Kloster
Handle NuClide
Contact exilekingdom@proton.me
Prior Disclosures CVE-2025-4364, ICSA-25-140-11 (CISA)
Methodology Passive reconnaissance + controlled proof-of-concept within sandbox boundary
Ethics No external systems accessed. No production data exfiltrated. No persistence installed beyond marker files. All evidence deleted with screencast proof.