fix(claude-code): chown idempotency + settings.json stub + T4 ownership note

Closes the three template-side gaps in the T4-tier workspace owner permission report: 1. entrypoint.sh chown idempotency. The chown of /home/agent/.claude was previously only fired inside the `if [ -d /root/.claude/sessions ]` guard. On first boot that's harmless — entrypoint creates the dir and the chown lands. But on second boot with a populated host volume (which T4 always has, because the workspace dir is bind-mounted for persistence) the dir may already be root-owned from a prior boot or from a newer claude-code release writing subdirs the entrypoint didn't pre-create. Result: uid-1000 agent EPERMs on every settings/session write, surfaced to the canvas as a generic Bash "permission restrictions" failure. Fix: pre-create sessions/ and session-env/, and run the chown unconditionally — idempotent + fast on small trees. 2. ~/.claude/settings.json stub. The Dockerfile + entrypoint never created this file. The agent's `cat ~/.claude/settings.json` correctly reported "No such file or directory" and the agent then assumed the workspace had no operating mode. Stub a minimal informational settings.json documenting that permission_mode='bypassPermissions' is the canonical mode (set programmatically in claude_sdk_executor.py — the file is NOT the source of truth, the SDK kwargs are). Idempotent: existing file is left alone. 3. CLAUDE.md — T4 ownership documentation. Add a "Workspace ownership tier — T4" section so the agent knows it has full host control and how to recover from EPERM if the ownership ever drifts. Add a "Knowing your own model" section pointing at the new `get_runtime_identity` MCP tool (shipped in molecule-ai-workspace-runtime 0.1.18) and an "Editing your own agent_card" section pointing at the new `update_agent_card` MCP tool. Test plan: - sh -n + bash -n on entrypoint.sh → syntax OK. - Idempotency probe: ran the chown/mkdir/stub fragment twice on a scratch tmpdir; second run does NOT overwrite a tampered settings.json, dirs already-existing is a `mkdir -p` no-op. - pytest tests/ → 81 passed (baseline maintained). Follow-up: - Bump .runtime-version to 0.1.18 in a follow-up PR after the runtime wheel hits PyPI via the publish workflow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Merge pull request 'fix(ci): port CI/validate to .gitea/ + inline (closes main-red)' (#17 ) from infra/main-red-fix-ci-validate into main
2026-05-15 14:28:57 -07:00 · 2026-05-11 19:53:44 +00:00 · 2026-05-11 12:30:26 -07:00
3 changed files with 282 additions and 2 deletions
@@ -0,0 +1,232 @@
+name: CI
+
+# Ported from .github/workflows/ci.yml on 2026-05-11 per internal#326
+# (Class-A root: cross-repo `uses:` blocker for Gitea 1.22.6 —
+# feedback_gitea_cross_repo_uses_blocked).
+#
+# Root cause of the main-red CI on this repo:
+#   The .github/ original used
+#     uses: molecule-ai/molecule-ci/.github/workflows/validate-workspace-template.yml@main
+#   which Gitea 1.22.6 rejects (DEFAULT_ACTIONS_URL=github → 404 against
+#   the remote repo even though it lives on the same Gitea instance).
+#   Gitea reads .github/ as a fallback when .gitea/ is absent
+#   (reference_per_repo_gitea_vs_github_actions_dir), so the .github/
+#   workflow was firing on Gitea and failing in 1s.
+#
+# Fix shape: inline the validation logic directly. The canonical
+# validator in molecule-ai/molecule-ci already self-clones into the
+# runner via a direct HTTPS `git clone` step (validate-workspace-template.yml
+# does this verbatim) — so the inline port is just "do that clone +
+# invoke the validator script in-place", preserving the
+# single-source-of-truth property (each CI run still fetches the
+# canonical validator fresh).
+#
+# Four-surface migration audit (feedback_gitea_actions_migration_audit_pattern):
+#   1. YAML — no `workflow_dispatch.inputs`; no `merge_group`; preserved
+#      `on: [push, pull_request]` from the original. Added workflow-level
+#      env.GITHUB_SERVER_URL (feedback_act_runner_github_server_url).
+#   2. Cache — `actions/setup-python` `cache: pip` preserved; works against
+#      Gitea's built-in cache server when runner.cache is configured.
+#   3. Token — uses auto-injected GITHUB_TOKEN (Gitea-aliased). Validator
+#      job needs only `contents: read` (no write to issues/PRs).
+#   4. Docs — anonymous git-clone of molecule-ci (no token in URL); the
+#      molecule-ci repo is public on the Gitea instance.
+#
+# Fork-PR semantics: validate-runtime is intentionally skipped on fork
+# PRs because pip-install + docker-build + adapter-import are arbitrary
+# code execution. Internal PRs and main pushes get full coverage. The
+# `github.event.pull_request.head.repo.fork` field is null for non-PR
+# events; the `!= true` comparison defaults to running.
+#
+# Cross-links:
+#   - internal#326 — parent tracking issue
+#   - molecule-ai/molecule-ci/.github/workflows/validate-workspace-template.yml — pattern source
+#   - molecule-ai/molecule-core/.gitea/workflows/ci.yml — Gitea port style reference
+
+on: [push, pull_request]
+
+env:
+  # Belt-and-suspenders against the runner-default trap
+  # (feedback_act_runner_github_server_url). Runners are configured
+  # with this env via /opt/molecule/runners/config.yaml runner.envs,
+  # but pinning at the workflow level protects against a runner
+  # regenerated without the config file.
+  GITHUB_SERVER_URL: https://git.moleculesai.app
+
+# Defense-in-depth on the GITHUB_TOKEN scope. The validate-runtime job
+# runs untrusted-by-design code from the calling repo — pip-installs
+# requirements.txt (post-install hooks), imports adapter.py, and
+# docker-builds the Dockerfile. Each primitive can execute arbitrary
+# code with the token in env. Pinning `contents: read` means the worst
+# a malicious template PR can do with the token is read public repo
+# state — no write to issues, no push to branches, no comment-spam.
+permissions:
+  contents: read
+
+jobs:
+  validate-static:
+    name: Template validation (static)
+    runs-on: ubuntu-latest
+    timeout-minutes: 5
+    steps:
+      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
+      # Canonical validator script lives in molecule-ci, fetched fresh on
+      # every run. Anonymous fetch of the public molecule-ci repo — no
+      # token needed; no actions/checkout cross-repo idiosyncrasies.
+      - name: Fetch molecule-ci canonical scripts
+        run: git clone --depth 1 https://git.moleculesai.app/molecule-ai/molecule-ci.git .molecule-ci-canonical
+      - uses: actions/setup-python@v5
+        with:
+          python-version: "3.11"
+      # Secret scan — the most important check. Always runs, including
+      # on fork PRs (no third-party code executes here).
+      - name: Check for secrets
+        run: |
+          python3 - << 'PYEOF'
+          import os, re, sys
+          from pathlib import Path
+
+          PATTERNS = [
+              re.compile(r'''["']sk-ant-[a-zA-Z0-9]{50,}["']'''),
+              re.compile(r'''["']ghp_[a-zA-Z0-9]{36,}["']'''),
+              re.compile(r'''["']AKIA[A-Z0-9]{16}["']'''),
+              re.compile(r'''["'][a-zA-Z0-9/+=]{40}["']'''),
+              re.compile(r'''["']sk_test_[a-zA-Z0-9]{24,}["']'''),
+              re.compile(r'''["']Bearer\s+[a-zA-Z0-9_.-]{20,}["']'''),
+              re.compile(r'''ghp_[a-zA-Z0-9]{36,}'''),
+              re.compile(r'''sk-ant-[a-zA-Z0-9]{50,}'''),
+          ]
+          SKIP_DIRS = {'.molecule-ci', '.molecule-ci-canonical', '.git', 'node_modules', '__pycache__'}
+          EXTENSIONS = {'.yaml', '.yml', '.md', '.py', '.sh'}
+
+          def is_false_positive(line):
+              ctx = line.lower()
+              return '...' in ctx or '<example' in ctx or '</example' in ctx
+
+          root = Path(os.environ.get('GITHUB_WORKSPACE', '.'))
+          warnings = []
+          for dirpath, dirnames, filenames in os.walk(root):
+              dirnames[:] = [d for d in dirnames if d not in SKIP_DIRS]
+              for filename in filenames:
+                  if Path(filename).suffix not in EXTENSIONS:
+                      continue
+                  filepath = Path(dirpath) / filename
+                  try:
+                      with open(filepath, 'r', encoding='utf-8', errors='ignore') as f:
+                          for lineno, line in enumerate(f.readlines(), 1):
+                              for pattern in PATTERNS:
+                                  for match in pattern.finditer(line):
+                                      if not is_false_positive(line):
+                                          warnings.append(f"  {filepath}:{lineno}: {match.group(0)[:40]}...")
+                  except Exception:
+                      pass
+
+          if warnings:
+              print("::error::Potential secret found in committed files:")
+              for w in warnings:
+                  print(w)
+              sys.exit(1)
+          else:
+              print("::notice::No secrets detected")
+          PYEOF
+      # Static-only validator — file existence checks, YAML parse,
+      # AST inspection of adapter.py (no import). Doesn't execute any
+      # third-party code; safe on fork PRs.
+      - run: pip install pyyaml -q
+      - run: python3 .molecule-ci-canonical/scripts/validate-workspace-template.py --static-only
+
+  validate-runtime:
+    name: Template validation (runtime)
+    runs-on: ubuntu-latest
+    timeout-minutes: 15
+    needs: validate-static
+    # Skip when the PR comes from a fork — those are external,
+    # untrusted, and would let attackers run pip install / docker build
+    # / adapter.py import on our runner.
+    if: github.event.pull_request.head.repo.fork != true
+    steps:
+      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
+      - name: Fetch molecule-ci canonical scripts
+        run: git clone --depth 1 https://git.moleculesai.app/molecule-ai/molecule-ci.git .molecule-ci-canonical
+      - uses: actions/setup-python@v5
+        with:
+          python-version: "3.11"
+          cache: "pip"
+          cache-dependency-path: requirements.txt
+      - run: pip install pyyaml -q
+      # Install the template's runtime dependencies so the validator's
+      # check_adapter_runtime_load() can import adapter.py the same way
+      # the workspace container does at boot. Without this, a
+      # syntactically-valid adapter that ImportErrors on a missing
+      # transitive dep would build clean and crash on first user prompt.
+      - if: hashFiles('requirements.txt') != ''
+        run: pip install -q -r requirements.txt
+      - if: hashFiles('requirements.txt') == ''
+        run: pip install -q molecule-ai-workspace-runtime
+      - run: python3 .molecule-ci-canonical/scripts/validate-workspace-template.py
+      - name: Docker build smoke test
+        if: hashFiles('Dockerfile') != ''
+        run: |
+          # Graceful skip when the runner's job-container can't reach the
+          # Docker daemon (e.g. /var/run/docker.sock not mounted into the
+          # act job container, or the in-container uid not in the docker
+          # group). Without this guard, CI stays red even when the
+          # template's Dockerfile is fine — see internal#222 for the
+          # proper runner-config fix.
+          if ! docker info >/dev/null 2>&1; then
+            echo "::warning::docker daemon unreachable from runner job container — skipping Docker build smoke (runner-config gap, not a template issue)."
+            exit 0
+          fi
+          docker build -t template-test . --no-cache 2>&1 | tail -5 && echo "Docker build succeeded"
+
+  # Aggregator that emits a single `validate` check name — matches the
+  # historical required-check name on this repo's branch protection.
+  validate:
+    name: validate
+    runs-on: ubuntu-latest
+    needs: [validate-static, validate-runtime]
+    if: always()
+    timeout-minutes: 1
+    steps:
+      - name: Aggregate
+        run: |
+          static="${{ needs.validate-static.result }}"
+          runtime="${{ needs.validate-runtime.result }}"
+          echo "validate-static:  $static"
+          echo "validate-runtime: $runtime"
+          if [ "$static" != "success" ]; then
+            echo "::error::validate-static did not succeed: $static"
+            exit 1
+          fi
+          # Treat `skipped` as a pass for fork-PR semantics (validate-runtime
+          # is intentionally skipped on forks; static coverage is the gate).
+          if [ "$runtime" != "success" ] && [ "$runtime" != "skipped" ]; then
+            echo "::error::validate-runtime did not succeed: $runtime"
+            exit 1
+          fi
+          echo "::notice::Template validation aggregate passed (static=$static, runtime=$runtime)"
+
+  tests:
+    name: Adapter unit tests
+    runs-on: ubuntu-latest
+    timeout-minutes: 5
+    steps:
+      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
+      - uses: actions/setup-python@v5
+        with:
+          python-version: "3.11"
+      # pyyaml is the runtime dep that adapter.py's _load_providers reads
+      # /configs/config.yaml through. In production it arrives transitively
+      # via molecule-ai-workspace-runtime; in this minimal test env we
+      # install it explicitly so the YAML-loading code path is actually
+      # exercised (without it, _load_providers' broad except-Exception
+      # swallows the ImportError and silently falls back to _BUILTIN_PROVIDERS,
+      # which is exactly the behavior that bit us 2026-04-30 when CI
+      # claimed green on a build that couldn't route any third-party model).
+      - run: pip install -q pytest pytest-asyncio pyyaml
+      # Tests live under tests/ with their own pytest.ini that anchors
+      # rootdir there — keeps pytest from importing the package
+      # __init__.py (which does `from .adapter import ...` for runtime
+      # discovery and can't be satisfied without molecule_runtime
+      # installed). See tests/pytest.ini for the full rationale.
+      - run: python3 -m pytest tests/ -v
@@ -72,6 +72,27 @@ Skills persist across restarts. Use them to codify best practices, coding standa
 ## Language
 Always respond in the same language the user uses. If Chinese, respond in Chinese. If English, respond in English. Match exactly.

+## Workspace ownership tier — T4 (top-tier)
+
+**T4 (top-tier) workspaces grant full host control.** The Docker socket is mounted into the container; the runtime is started with `--privileged`; the workspace owner can `docker exec` into siblings on the same host and read/write anything the host kernel will let it touch.
+
+What this means in practice:
+
+- The container's home directory `~/.claude/` (and its `sessions/`, `session-env/`, `settings.json`) is persisted across restarts via a host bind mount. Anything you (or the SDK) write there survives container churn — but ownership can drift.
+- If you see `EPERM` / "permission denied" on `~/.claude/` writes after a restart — particularly on `settings.json` or anything under `sessions/` — the volume picked up `root:root` ownership from a prior boot or a newer claude-code release wrote subdirs the entrypoint didn't pre-create.
+- Recovery (do this yourself; you have the rights): `sudo chown -R agent:agent ~/.claude/`. The entrypoint already runs a recursive chown unconditionally on every boot, so a restart also clears it — but `sudo chown` is faster and doesn't drop the active session.
+- The provisioned `~/.claude/settings.json` is informational: it documents that `permission_mode='bypassPermissions'` is the canonical operating mode, which is also set programmatically in `claude_sdk_executor.py` (the file is NOT the source of truth — the SDK kwargs are).
+
+If `cat ~/.claude/settings.json` returns `No such file or directory` you're on a workspace image older than 2026-05-15 — restart picks up the new entrypoint and stubs the file in place.
+
+## Knowing your own model
+
+Use the `get_runtime_identity` MCP tool to know what model you actually are. It reads the live process env (`MODEL`, `MODEL_PROVIDER`, `MOLECULE_MODEL`, `ANTHROPIC_BASE_URL`, `TIER`, `WORKSPACE_ID`, `ADAPTER_MODULE`) and returns the resolved values — no HTTP call, always works, always permitted by RBAC. Do NOT guess from your system prompt or from `requirements.txt`; the operator may have routed you to a different model via persona env between boots.
+
+## Editing your own agent_card
+
+Use the `update_agent_card` MCP tool to update this workspace's `agent_card` on the platform. Pass a JSON object — the platform validates required fields server-side. The change is broadcast as an `agent_card_updated` event so the canvas reflects the new card live. The tool is gated on `memory.write` capability, so read-only agents won't accidentally rewrite the card; T4 owners always have this capability.
+
 ## Runtime wedge integration

 The `runtime_wedge` module (in `molecule_runtime`) is the universal cross-cutting holder for "this Python process can no longer serve queries — only a workspace restart will recover." It surfaces unrecoverable wedges to two consumers:
@@ -70,9 +70,36 @@ if [ "$(id -u)" = "0" ]; then
    # finds it when running as agent. The provisioner's mount point is
    # hardcoded to /root/.claude/sessions; we don't want to change the
    # platform contract just for this template.
-    mkdir -p /home/agent/.claude
+    #
+    # NOTE (T4 perms regression): on FIRST boot the host volume mount for
+    # /home/agent/.claude doesn't exist yet — entrypoint creates it and
+    # the chown lands inside the `if -d /root/.claude/sessions` guard.
+    # On SECOND boot with a populated /home/agent/.claude (sessions/,
+    # session-env/, settings.json — any of which the SDK or agent has
+    # written between boots) the dir may already be root-owned because
+    # the SDK's working files inherited root's uid when written under
+    # the prior root segment of an earlier entrypoint, OR because a
+    # newer claude-code release writes new subdirs we don't create here.
+    # That leaves uid-1000 agent EPERMing on every settings/session write
+    # ("permission restrictions" surfaced to the canvas as a generic
+    # Bash failure). Fix: create the well-known subdirs idempotently
+    # and run the chown unconditionally (no-op when ownership is already
+    # correct, fast on small trees). Stub ~/.claude/settings.json too so
+    # the agent's introspection (cat ~/.claude/settings.json) succeeds
+    # and shows operating mode — bypassPermissions is the canonical
+    # mode set programmatically by claude_sdk_executor.py.
+    mkdir -p /home/agent/.claude/sessions /home/agent/.claude/session-env
+    if [ ! -f /home/agent/.claude/settings.json ]; then
+        cat > /home/agent/.claude/settings.json <<'EOF'
+{
+  "permissions": {"defaultMode": "bypassPermissions"},
+  "_note": "Mode is also set programmatically by claude_sdk_executor.py (permission_mode='bypassPermissions'); this file is informational and lets `cat ~/.claude/settings.json` succeed."
+}
+EOF
+    fi
+    chown -R agent:agent /home/agent/.claude 2>/dev/null
    if [ -d /root/.claude/sessions ]; then
-        chown -R agent:agent /root/.claude /home/agent/.claude 2>/dev/null
+        chown -R agent:agent /root/.claude 2>/dev/null
        ln -sfn /root/.claude/sessions /home/agent/.claude/sessions
    fi