Files
molecule-ai-workspace-templ…/entrypoint.sh
T
core-devops 12dd60413d
CI / validate (push) Blocked by required conditions
CI / Template validation (static) (push) Successful in 2m5s
CI / Adapter unit tests (push) Successful in 1m57s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 17s
CI / Template validation (static) (pull_request) Successful in 1m44s
CI / Adapter unit tests (pull_request) Successful in 1m49s
CI / Template validation (runtime) (push) Successful in 12m24s
CI / T4 tier-4 conformance (live) (push) Failing after 12m20s
CI / Template validation (runtime) (pull_request) Successful in 9m27s
CI / T4 tier-4 conformance (live) (pull_request) Successful in 8m59s
CI / validate (pull_request) Successful in 16s
feat(claude-code): T4 host-root escalation leg + real tier-4 conformance gate (RFC internal#456 §9-11)
T4 currently ships only the provisioner privileged-container shape;
the in-image uid-1000 agent has NO wired path to host root inside
--privileged --pid=host -v /:/host (--privileged grants caps to root,
not uid-1000; root:docker 0660 docker.sock unusable). This adds the
ADDITIVE escalation leg, preserving the uid-1000 + agent-owned-token
contract:

- Dockerfile: bake sudo + util-linux(nsenter) + docker.io CLI;
  /etc/sudoers.d/agent-t4 `agent ALL=(ALL) NOPASSWD:ALL` (0440,
  visudo-validated at build); `agent` in `docker` group. useradd
  -u 1000 + `exec gosu agent` UNCHANGED — agent stays uid-1000.
- entrypoint.sh: document the agent-owned-token half of the §10
  atomic co-sequencing contract on the existing `chown -R agent
  /configs` (token ownership NOT regressed).
- ci.yml: new `t4-conformance` job — NOT a string-match. Builds the
  real image, runs it under the EXACT controlplane tier-4 flags, and
  asserts on the RUNNING container, atomically: (a) the uid-1000
  agent attains host root (sudo nsenter --target 1 + host-fs
  write/readback through /host) AND (b) /configs/.auth_token
  owner_uid==1000. Wired into the required `validate` aggregator and
  fails closed (no skip except fork-PR short-circuit).

RFC internal#456 §9-11 / PR#474. Atomic per §10: uid-1000 enforcement
and the escalation leg ship in this one image revision.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 11:44:43 -07:00

170 lines
8.8 KiB
Bash

#!/bin/sh
# Drop privileges to the agent user before exec'ing molecule-runtime.
# claude-code refuses --dangerously-skip-permissions when running as
# root/sudo for safety. Without this entrypoint, every cron tick fails
# with `ProcessError: Command failed with exit code 1` and the agent
# logs `--dangerously-skip-permissions cannot be used with root/sudo
# privileges for security reasons`.
#
# Pattern matches the legacy monorepo workspace-template/entrypoint.sh:
# fix volume ownership as root, then re-exec via gosu as agent (uid 1000).
# Boot-context snapshot — emitted on EVERY container start, including
# every restart of a crash-loop. Lets `docker logs` answer "what env
# was actually present?" without having to docker exec into a dying
# container. Logs NAMES of auth-relevant env vars, never VALUES. Fires
# twice (once as root pre-gosu, once as agent post-gosu) so an operator
# can see whether a value was lost across the privilege drop.
# Keep the env-name list in sync with adapter.py's _AUTH_ENV_AUDIT —
# the same set of vendors should be audited from both sides.
log_boot_context() {
echo "----- entrypoint boot $(date -u +%Y-%m-%dT%H:%M:%SZ) -----"
echo "uid=$(id -u) gid=$(id -g) user=$(id -un 2>/dev/null || echo unknown)"
echo "hostname=$(hostname) workspace_id=${WORKSPACE_ID:-<unset>}"
echo "platform_url=${PLATFORM_URL:-<unset>}"
echo "configs_dir: $(ls -ld /configs 2>/dev/null || echo MISSING)"
echo "configs_contents: $(ls /configs 2>/dev/null | tr '\n' ' ' || echo MISSING)"
echo "workspace_dir: $(ls -ld /workspace 2>/dev/null || echo MISSING)"
# Auth env presence (NAMES + set/unset only — never the values).
# Mirror of _AUTH_ENV_AUDIT in adapter.py — keep in sync if you add a vendor.
for var in CLAUDE_CODE_OAUTH_TOKEN ANTHROPIC_API_KEY ANTHROPIC_AUTH_TOKEN ANTHROPIC_BASE_URL MINIMAX_API_KEY GLM_API_KEY KIMI_API_KEY DEEPSEEK_API_KEY; do
eval "val=\$$var"
if [ -n "$val" ]; then
echo "env $var=set"
else
echo "env $var=unset"
fi
done
echo "------------------------------------------------"
}
log_boot_context
if [ "$(id -u)" = "0" ]; then
# Configs volume is created by Docker as root; agent needs write access
# for plugin installs, memory writes, .auth_token rotation, etc.
#
# T4 atomic-co-sequencing contract (RFC internal#456 §10): the T4
# escalation leg (sudo NOPASSWD + docker group, baked in the
# Dockerfile) is ADDITIVE. The agent still runs uid-1000 and
# /configs/.auth_token MUST remain agent-owned — escalation must
# NOT regress the Hermes list_peers-401 token-ownership class.
# This chown -R is the agent-ownership half of that contract; the
# Layer-3 conformance gate asserts owner_uid==1000 on the running
# container alongside the host-root-reach assertion.
chown -R agent:agent /configs 2>/dev/null
# /workspace handling — only chown when the contents are root-owned
# (typical on Docker Desktop on Windows where host uid maps to 0).
# On Linux Docker with matching uids the recursive chown is skipped
# to keep startup fast.
chown agent:agent /workspace 2>/dev/null || true
if [ -d /workspace ]; then
first_entry=$(find /workspace -mindepth 1 -maxdepth 1 -print -quit 2>/dev/null)
if [ -n "$first_entry" ] && [ "$(stat -c '%u' "$first_entry" 2>/dev/null)" = "0" ]; then
chown -R agent:agent /workspace 2>/dev/null
fi
# Pre-create /workspace/.molecule/chat-uploads so the upload
# handler in workspace/internal_chat_uploads.py never has to
# mkdir as agent inside a root-owned tree. Without this the
# first upload after a fresh provision fails with "failed to
# prepare uploads dir" because the volume mount comes up with
# root-owned `.molecule` whenever a sibling subsystem (e.g. an
# adapter writing telemetry, or a workspace runtime that ran
# before the chown landed) raced ahead. Idempotent: a re-run
# finds the dir already there, mode 0755 / agent:agent.
mkdir -p /workspace/.molecule/chat-uploads 2>/dev/null || true
chown -R agent:agent /workspace/.molecule 2>/dev/null || true
fi
# Claude Code session directory — mounted at /root/.claude/sessions by
# the platform provisioner. Symlink it into agent's home so the SDK
# finds it when running as agent. The provisioner's mount point is
# hardcoded to /root/.claude/sessions; we don't want to change the
# platform contract just for this template.
#
# NOTE (T4 perms regression): on FIRST boot the host volume mount for
# /home/agent/.claude doesn't exist yet — entrypoint creates it and
# the chown lands inside the `if -d /root/.claude/sessions` guard.
# On SECOND boot with a populated /home/agent/.claude (sessions/,
# session-env/, settings.json — any of which the SDK or agent has
# written between boots) the dir may already be root-owned because
# the SDK's working files inherited root's uid when written under
# the prior root segment of an earlier entrypoint, OR because a
# newer claude-code release writes new subdirs we don't create here.
# That leaves uid-1000 agent EPERMing on every settings/session write
# ("permission restrictions" surfaced to the canvas as a generic
# Bash failure). Fix: create the well-known subdirs idempotently
# and run the chown unconditionally (no-op when ownership is already
# correct, fast on small trees). Stub ~/.claude/settings.json too so
# the agent's introspection (cat ~/.claude/settings.json) succeeds
# and shows operating mode — bypassPermissions is the canonical
# mode set programmatically by claude_sdk_executor.py.
mkdir -p /home/agent/.claude/sessions /home/agent/.claude/session-env
if [ ! -f /home/agent/.claude/settings.json ]; then
cat > /home/agent/.claude/settings.json <<'EOF'
{
"permissions": {"defaultMode": "bypassPermissions"},
"_note": "Mode is also set programmatically by claude_sdk_executor.py (permission_mode='bypassPermissions'); this file is informational and lets `cat ~/.claude/settings.json` succeed."
}
EOF
fi
chown -R agent:agent /home/agent/.claude 2>/dev/null
if [ -d /root/.claude/sessions ]; then
chown -R agent:agent /root/.claude 2>/dev/null
ln -sfn /root/.claude/sessions /home/agent/.claude/sessions
fi
# GitHub credential helper setup (fix #1933 / #1866 / #547).
# Runs as root so the global gitconfig is written before we drop to agent.
# The helper fetches fresh GitHub App installation tokens from the
# platform API on every git push/clone, with caching + env-var fallback.
if [ -x /app/scripts/molecule-git-token-helper.sh ]; then
git config --global "credential.https://github.com.helper" \
"!/app/scripts/molecule-git-token-helper.sh"
git config --global "credential.https://github.com.useHttpPath" true
if [ -f /root/.gitconfig ]; then
cp /root/.gitconfig /home/agent/.gitconfig
chown agent:agent /home/agent/.gitconfig
fi
fi
mkdir -p /home/agent/.molecule-token-cache
chown agent:agent /home/agent/.molecule-token-cache
chmod 700 /home/agent/.molecule-token-cache
exec gosu agent "$0" "$@"
fi
# Now running as agent (uid 1000)
# Background token refresh daemon — keeps `gh` CLI auth + credential helper
# cache warm across the ~60 min GitHub App installation token TTL. Wrapped
# in a respawn loop so a daemon crash doesn't silently leave the workspace
# stuck on an expired token (which is exactly how #1933 was discovered).
if [ -x /app/scripts/molecule-gh-token-refresh.sh ]; then
nohup bash -c '
while true; do
/app/scripts/molecule-gh-token-refresh.sh
rc=$?
echo "[molecule-gh-token-refresh] daemon exited rc=$rc — respawning in 30s" >&2
sleep 30
done
' > /home/agent/.gh-token-refresh.log 2>&1 &
fi
# Initial gh auth — primes the CLI with whatever GH_TOKEN/GITHUB_TOKEN was
# injected at provision time, so commands work in the ~60s window before the
# background daemon's first refresh fires.
if [ -n "${GITHUB_TOKEN:-}" ]; then
echo "${GITHUB_TOKEN}" | gh auth login --hostname github.com --with-token 2>/dev/null || true
elif [ -n "${GH_TOKEN:-}" ]; then
echo "${GH_TOKEN}" | gh auth login --hostname github.com --with-token 2>/dev/null || true
fi
# Third-party provider routing is now handled by adapter.py at boot —
# it reads the `providers:` registry from /configs/config.yaml and sets
# ANTHROPIC_BASE_URL based on the picked MODEL. Adding a new provider
# is a one-line YAML edit (see config.yaml's `providers:` section).
# Operator-set ANTHROPIC_BASE_URL still wins as the escape hatch for
# regional endpoints (e.g. Xiaomi's token-plan-sgp.*, MiniMax's
# api.minimaxi.com China endpoint).
exec molecule-runtime "$@"