Compare commits

...

25 Commits

Author SHA1 Message Date
hongming f5cc9493bb Merge pull request 'feat(security): RFC#523 3-layer forbidden-env guardrail for tenant workspaces (task #146)' (#1555) from feat/146-forbidden-env-guard into main
CI / all-required (push) Waiting to run
CI / Platform (Go) (push) Waiting to run
Block internal-flavored paths / Block forbidden paths (push) Waiting to run
CI / Detect changes (push) Waiting to run
CI / Canvas (Next.js) (push) Waiting to run
CI / Shellcheck (E2E scripts) (push) Waiting to run
CI / Canvas Deploy Reminder (push) Blocked by required conditions
CI / Python Lint & Test (push) Waiting to run
E2E API Smoke Test / detect-changes (push) Waiting to run
E2E API Smoke Test / E2E API Smoke Test (push) Blocked by required conditions
E2E Chat / detect-changes (push) Waiting to run
E2E Chat / E2E Chat (push) Blocked by required conditions
E2E Staging Canvas (Playwright) / detect-changes (push) Waiting to run
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Blocked by required conditions
Handlers Postgres Integration / detect-changes (push) Waiting to run
Runtime PR-Built Compatibility / detect-changes (push) Waiting to run
Handlers Postgres Integration / Handlers Postgres Integration (push) Blocked by required conditions
lint-continue-on-error-tracking / lint-continue-on-error-tracking (push) Waiting to run
Lint curl status-code capture / Scan workflows for curl status-capture pollution (push) Waiting to run
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (push) Waiting to run
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (push) Waiting to run
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (push) Waiting to run
publish-workspace-server-image / build-and-push (push) Waiting to run
publish-workspace-server-image / Production auto-deploy (push) Blocked by required conditions
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Blocked by required conditions
Secret scan / Scan diff for credential-shaped strings (push) Waiting to run
Harness Replays / detect-changes (push) Successful in 15s
publish-runtime-autobump / pr-validate (push) Successful in 32s
publish-runtime-autobump / bump-and-tag (push) Successful in 38s
Harness Replays / Harness Replays (push) Successful in 3s
2026-05-19 01:57:30 +00:00
hongming 71ad3ffe1d Merge pull request 'fix(sop-checklist): widen ack eligibility per RFC#450 Option C (closes internal#442)' (#1554) from fix/sop-checklist-widen-ack-internal-442 into main
CI / Platform (Go) (push) Waiting to run
Block internal-flavored paths / Block forbidden paths (push) Waiting to run
CI / Detect changes (push) Waiting to run
CI / Shellcheck (E2E scripts) (push) Waiting to run
CI / Python Lint & Test (push) Waiting to run
CI / Canvas (Next.js) (push) Waiting to run
CI / Canvas Deploy Reminder (push) Blocked by required conditions
CI / all-required (push) Waiting to run
E2E API Smoke Test / detect-changes (push) Waiting to run
E2E API Smoke Test / E2E API Smoke Test (push) Blocked by required conditions
E2E Chat / detect-changes (push) Waiting to run
E2E Chat / E2E Chat (push) Blocked by required conditions
E2E Staging Canvas (Playwright) / detect-changes (push) Waiting to run
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Blocked by required conditions
Handlers Postgres Integration / detect-changes (push) Waiting to run
Handlers Postgres Integration / Handlers Postgres Integration (push) Blocked by required conditions
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (push) Waiting to run
publish-workspace-server-image / build-and-push (push) Waiting to run
publish-workspace-server-image / Production auto-deploy (push) Blocked by required conditions
Runtime PR-Built Compatibility / detect-changes (push) Waiting to run
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Blocked by required conditions
Secret scan / Scan diff for credential-shaped strings (push) Waiting to run
Ops Scripts Tests / Ops scripts (unittest) (push) Successful in 1m35s
2026-05-19 01:57:08 +00:00
hongming a3fc350c6e Merge pull request 'test(e2e): local prod-mimic backend for peer-visibility MCP gate + make e2e-peer-visibility (task #166)' (#1551) from e2e/peer-visibility-local-backend-task166 into main
CI / all-required (push) Waiting to run
CI / Platform (Go) (push) Waiting to run
Block internal-flavored paths / Block forbidden paths (push) Waiting to run
CI / Detect changes (push) Waiting to run
E2E API Smoke Test / detect-changes (push) Waiting to run
E2E API Smoke Test / E2E API Smoke Test (push) Blocked by required conditions
E2E Chat / detect-changes (push) Waiting to run
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Blocked by required conditions
CI / Canvas (Next.js) (push) Waiting to run
CI / Shellcheck (E2E scripts) (push) Waiting to run
CI / Canvas Deploy Reminder (push) Blocked by required conditions
CI / Python Lint & Test (push) Waiting to run
E2E Chat / E2E Chat (push) Blocked by required conditions
E2E Staging Canvas (Playwright) / detect-changes (push) Waiting to run
Handlers Postgres Integration / detect-changes (push) Waiting to run
Handlers Postgres Integration / Handlers Postgres Integration (push) Blocked by required conditions
lint-continue-on-error-tracking / lint-continue-on-error-tracking (push) Waiting to run
Lint curl status-code capture / Scan workflows for curl status-capture pollution (push) Waiting to run
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (push) Waiting to run
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (push) Waiting to run
publish-workspace-server-image / build-and-push (push) Waiting to run
publish-workspace-server-image / Production auto-deploy (push) Blocked by required conditions
Runtime PR-Built Compatibility / detect-changes (push) Waiting to run
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Blocked by required conditions
Secret scan / Scan diff for credential-shaped strings (push) Waiting to run
Ops Scripts Tests / Ops scripts (unittest) (push) Waiting to run
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (local) (push) Failing after 1m18s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (push) Failing after 2m8s
2026-05-19 01:57:06 +00:00
hongming 57364c1bed Merge pull request 'ci: arm64-lane pilot (additive shellcheck on Mac runner) [#233]' (#1553) from ci/mac-arm64-pilot-shellcheck into main
CI / all-required (push) Waiting to run
Block internal-flavored paths / Block forbidden paths (push) Waiting to run
CI / Detect changes (push) Waiting to run
CI / Platform (Go) (push) Waiting to run
CI / Canvas (Next.js) (push) Waiting to run
CI / Shellcheck (E2E scripts) (push) Waiting to run
CI / Canvas Deploy Reminder (push) Blocked by required conditions
CI / Python Lint & Test (push) Waiting to run
E2E API Smoke Test / detect-changes (push) Waiting to run
E2E API Smoke Test / E2E API Smoke Test (push) Blocked by required conditions
E2E Chat / detect-changes (push) Waiting to run
E2E Chat / E2E Chat (push) Blocked by required conditions
E2E Staging Canvas (Playwright) / detect-changes (push) Waiting to run
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Blocked by required conditions
Handlers Postgres Integration / detect-changes (push) Waiting to run
Handlers Postgres Integration / Handlers Postgres Integration (push) Blocked by required conditions
lint-continue-on-error-tracking / lint-continue-on-error-tracking (push) Waiting to run
Lint curl status-code capture / Scan workflows for curl status-capture pollution (push) Waiting to run
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (push) Waiting to run
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (push) Waiting to run
publish-workspace-server-image / Production auto-deploy (push) Blocked by required conditions
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Blocked by required conditions
Runtime PR-Built Compatibility / detect-changes (push) Waiting to run
Secret scan / Scan diff for credential-shaped strings (push) Waiting to run
publish-workspace-server-image / build-and-push (push) Has been cancelled
2026-05-19 01:56:16 +00:00
hongming acc149e18e Merge pull request 'fix(canvas/chat): surface actionable error reason in chat banner + link to Activity tab (internal#212)' (#1550) from fix/canvas-surface-error-detail into main
CI / Platform (Go) (push) Waiting to run
Block internal-flavored paths / Block forbidden paths (push) Waiting to run
CI / Detect changes (push) Waiting to run
CI / Canvas (Next.js) (push) Waiting to run
CI / Shellcheck (E2E scripts) (push) Waiting to run
CI / Canvas Deploy Reminder (push) Blocked by required conditions
CI / Python Lint & Test (push) Waiting to run
CI / all-required (push) Waiting to run
E2E API Smoke Test / detect-changes (push) Waiting to run
E2E API Smoke Test / E2E API Smoke Test (push) Blocked by required conditions
E2E Chat / detect-changes (push) Waiting to run
E2E Chat / E2E Chat (push) Blocked by required conditions
E2E Staging Canvas (Playwright) / detect-changes (push) Waiting to run
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Blocked by required conditions
Handlers Postgres Integration / detect-changes (push) Waiting to run
Handlers Postgres Integration / Handlers Postgres Integration (push) Blocked by required conditions
Harness Replays / detect-changes (push) Waiting to run
Harness Replays / Harness Replays (push) Blocked by required conditions
lint-continue-on-error-tracking / lint-continue-on-error-tracking (push) Waiting to run
Lint curl status-code capture / Scan workflows for curl status-capture pollution (push) Waiting to run
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (push) Waiting to run
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (push) Waiting to run
publish-workspace-server-image / build-and-push (push) Waiting to run
publish-workspace-server-image / Production auto-deploy (push) Blocked by required conditions
Runtime PR-Built Compatibility / detect-changes (push) Waiting to run
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Blocked by required conditions
Secret scan / Scan diff for credential-shaped strings (push) Waiting to run
publish-canvas-image / Build & push canvas image (push) Successful in 2m50s
2026-05-19 01:56:12 +00:00
hongming 83ad7e252b Merge pull request 'fix(workspace-server): surface secret-safe error_detail on ACTIVITY_LOGGED (internal#212)' (#1549) from fix/wsserver-broadcast-error-detail into main
CI / Platform (Go) (push) Waiting to run
Block internal-flavored paths / Block forbidden paths (push) Waiting to run
CI / Detect changes (push) Waiting to run
CI / Shellcheck (E2E scripts) (push) Waiting to run
CI / Canvas Deploy Reminder (push) Blocked by required conditions
CI / Python Lint & Test (push) Waiting to run
CI / all-required (push) Waiting to run
E2E API Smoke Test / detect-changes (push) Waiting to run
E2E API Smoke Test / E2E API Smoke Test (push) Blocked by required conditions
E2E Chat / detect-changes (push) Waiting to run
E2E Chat / E2E Chat (push) Blocked by required conditions
CI / Canvas (Next.js) (push) Waiting to run
E2E Staging Canvas (Playwright) / detect-changes (push) Waiting to run
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Blocked by required conditions
Handlers Postgres Integration / detect-changes (push) Waiting to run
Handlers Postgres Integration / Handlers Postgres Integration (push) Blocked by required conditions
Harness Replays / detect-changes (push) Waiting to run
Harness Replays / Harness Replays (push) Blocked by required conditions
publish-workspace-server-image / Production auto-deploy (push) Blocked by required conditions
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Blocked by required conditions
Runtime PR-Built Compatibility / detect-changes (push) Waiting to run
Secret scan / Scan diff for credential-shaped strings (push) Waiting to run
publish-canvas-image / Build & push canvas image (push) Has been cancelled
publish-workspace-server-image / build-and-push (push) Has been cancelled
2026-05-19 01:56:10 +00:00
hongming d27df740f5 Merge pull request 'fix(ws-server): close self-fire restart feedback loop (internal#544)' (#1556) from fix/ws-server-self-fire-restart-loop into main
CI / Canvas Deploy Reminder (push) Blocked by required conditions
E2E API Smoke Test / E2E API Smoke Test (push) Blocked by required conditions
E2E Chat / E2E Chat (push) Blocked by required conditions
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Blocked by required conditions
Handlers Postgres Integration / Handlers Postgres Integration (push) Blocked by required conditions
Harness Replays / Harness Replays (push) Blocked by required conditions
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Blocked by required conditions
publish-workspace-server-image / build-and-push (push) Successful in 5m15s
Block internal-flavored paths / Block forbidden paths (push) Successful in 5s
CI / Detect changes (push) Successful in 7s
CI / Shellcheck (E2E scripts) (push) Successful in 11s
E2E API Smoke Test / detect-changes (push) Successful in 24s
E2E Chat / detect-changes (push) Successful in 10s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 10s
Handlers Postgres Integration / detect-changes (push) Successful in 6s
Harness Replays / detect-changes (push) Successful in 10s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 12s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 7s
Continuous synthetic E2E (staging) / Synthetic E2E against staging (push) Has started running
Sweep stale e2e-* orgs (staging) / Sweep e2e orgs (push) Successful in 2s
Sweep stale Cloudflare Tunnels / Sweep CF tunnels (push) Successful in 5s
CI / Platform (Go) (push) Successful in 5m20s
E2E Staging External Runtime / E2E Staging External Runtime (push) Successful in 5m16s
CI / Python Lint & Test (push) Successful in 6m53s
CI / Canvas (Next.js) (push) Successful in 7m19s
CI / all-required (push) Successful in 7m8s
publish-workspace-server-image / Production auto-deploy (push) Has been cancelled
2026-05-19 01:40:54 +00:00
core-devops 4bf87d122d fix(ws-server): close self-fire restart feedback loop (internal#544)
CI / Canvas Deploy Reminder (pull_request) Blocked by required conditions
E2E API Smoke Test / E2E API Smoke Test (pull_request) Blocked by required conditions
E2E Chat / E2E Chat (pull_request) Blocked by required conditions
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Blocked by required conditions
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Blocked by required conditions
Harness Replays / Harness Replays (pull_request) Blocked by required conditions
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Blocked by required conditions
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 4s
CI / Detect changes (pull_request) Successful in 8s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 20s
E2E API Smoke Test / detect-changes (pull_request) Successful in 7s
E2E Chat / detect-changes (pull_request) Successful in 7s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 6s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 4s
Harness Replays / detect-changes (pull_request) Successful in 3s
CI / Platform (Go) (pull_request) Successful in 4m38s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m8s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 6s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 7s
gate-check-v3 / gate-check (pull_request) Successful in 4s
qa-review / approved (pull_request) Failing after 5s
security-review / approved (pull_request) Failing after 4s
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-tier-check / tier-check (pull_request) Successful in 5s
sop-checklist / all-items-acked (pull_request) Successful in 6s
CI / Canvas (Next.js) (pull_request) Successful in 6m11s
CI / Python Lint & Test (pull_request) Successful in 6m56s
CI / all-required (pull_request) Successful in 6m29s
E2E Staging External Runtime / E2E Staging External Runtime (pull_request) Successful in 5m12s
audit-force-merge / audit (pull_request) Successful in 5s
Three-layer cohesive fix for the 2026-05-19 ~00:05-00:09Z 4x reprov thrash
class observed on prod-Reviewer + prod-Researcher: a single secrets PUT
fanned out into 4x stop+provision cycles per workspace within 4 min,
each stopping the just-launched (still-pending) EC2 of the previous
cycle. Root-caused via Loki (provision.ec2_started / ec2_stopped pairs).

Empirical chain (all in workspace-server/internal/handlers/):
1. secrets.go SetSecret → go h.restartFunc → coalesceRestart cycle.
2. runRestartCycle sets url='' synchronously, then async provisions EC2.
3. During 20-30s pending window: url='' AND cpProv.IsRunning()==false
   — indistinguishable from a dead container.
4. Canvas /delegations poll OR the trailing restart-context probe fires
   ProxyA2A → maybeMarkContainerDead OR preflightContainerHealth →
   RestartByID → loop.
5. coalesceRestart's pending flag drains by running ANOTHER full cycle
   → ec2_stopped of the just-booted instance → re-provision.

Fix (single PR, three interdependent layers):

L1) Restart-aware health probes — workspace_restart.go exposes
    isRestarting(workspaceID) bool. Both maybeMarkContainerDead and
    preflightContainerHealth early-return false/nil while a restart
    cycle is in flight. Breaks the self-fire at the probe layer.

L2) Restart-context probe gate — sendRestartContext now requires
    url != '' AND last_heartbeat_at > restart_start_ts before firing
    the trailing ProxyA2A probe. Adds waitForFreshHeartbeat() next to
    waitForWorkspaceOnline. Belt-and-suspenders so the probe never
    tries until the new container is actually addressable.

L3) RestartByID debounce — silent-drop successive RestartByID calls
    within restartDebounceWindow=60s of restartStartedAt. Not coalesce
    (which would still drain to another full cycle). Drop is observable
    via restartByIDDropCounter (atomic.Uint64) + the dropped log line.
    Only programmatic path; HTTP Restart handler is unaffected.

Tests:
- TestIsRestarting_{FalseWhenNoStateEntry,TrueWhileCycleRunning}
- TestMaybeMarkContainerDead_SkippedWhileRestarting (L1)
- TestPreflightContainerHealth_SkippedWhileRestarting (L1)
- TestRestartByID_DebounceSilentDrop (L3, counter assertion)
- TestRestartByID_DebounceExpiresAfterWindow (L3, window release)
- TestRestartByID_SingleProvisionPerRestart (regression — asserts
  exactly 1 cycle per trigger, with 4 dropped self-fire probes)

Existing coalesce/restart/preflight/maybeMarkContainerDead tests
remain green. Full handlers suite: ok in 15.8s.

Closes internal#544.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 18:24:09 -07:00
core-security aabf933a5c feat(security): RFC#523 3-layer forbidden-env guardrail for tenant workspaces (task #146)
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 5s
CI / Detect changes (pull_request) Successful in 7s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 11s
E2E API Smoke Test / detect-changes (pull_request) Successful in 8s
E2E Chat / detect-changes (pull_request) Successful in 12s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 9s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 12s
Harness Replays / detect-changes (pull_request) Successful in 7s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 9s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 6s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 1m14s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 1m18s
CI / Platform (Go) (pull_request) Successful in 5m6s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Failing after 1m7s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m3s
publish-runtime-autobump / bump-and-tag (pull_request) Has been skipped
publish-runtime-autobump / pr-validate (pull_request) Successful in 27s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 7s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 6s
gate-check-v3 / gate-check (pull_request) Successful in 5s
security-review / approved (pull_request) Failing after 5s
qa-review / approved (pull_request) Failing after 6s
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request) Successful in 4s
sop-tier-check / tier-check (pull_request) Successful in 5s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m26s
CI / Canvas (Next.js) (pull_request) Successful in 6m10s
CI / Python Lint & Test (pull_request) Successful in 6m38s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 1m20s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 10s
Harness Replays / Harness Replays (pull_request) Successful in 3s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 1m32s
E2E Chat / E2E Chat (pull_request) Failing after 5m29s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 1m4s
CI / all-required (pull_request) emitter-null compensating success (feedback_gitea_emitter_null_state_blocks_merge); CI ran, state never persisted by Gitea 1.22.6 emitter
audit-force-merge / audit (pull_request) Successful in 4s
Refuse to start a tenant workspace if any operator-fleet-scope env var
name is present. Threat model: a leaked GITEA_TOKEN /
CP_ADMIN_API_TOKEN / RAILWAY_TOKEN / INFISICAL_OPERATOR_TOKEN /
MOLECULE_OPERATOR_* in a tenant container would let a compromised
agent escalate from "compromise of one workspace" to "compromise of
the whole platform."

3-layer defense-in-depth:

L1 — provisioner-side fail-closed abort (Go):
  workspace_provision_forbidden_env.go + prepareProvisionContext hook.
  Runs immediately after loadWorkspaceSecrets, BEFORE the per-agent
  persona GIT_HTTP_* injection that legitimately sets a fallback
  GITEA_TOKEN. Catches leaks from the operator-controlled stores
  (global_secrets, workspace_secrets). The existing forensic #145
  silent-strip guard in provisioner.buildContainerEnv stays as
  defense-in-depth.

L2 — workspace/entrypoint.sh top-of-file env-grep + exit 1:
  Fires if both upstream layers are bypassed (e.g. docker run -e
  GITEA_TOKEN=... standalone). MOLECULE_TENANT_GUARD_DISABLE=1
  bypass for local-dev. POSIX-portable (busybox/alpine/debian).

L3 — .gitea/workflows/lint-forbidden-env-keys.yml:
  Scans workspace-server/internal/**.go for new code that hardcodes a
  forbidden env-var name. Exempts the deny-set definitions + the
  pre-existing persona-fallback paths whose downstream silent-strip +
  new L1 fail-closed already cover the runtime risk.

Tests:
  - L1: TestIsForbiddenTenantEnvKey_ExactMatches,
        TestIsForbiddenTenantEnvKey_PrefixMatches,
        TestFindForbiddenTenantEnvKeys_NoneAndEmpty,
        TestFindForbiddenTenantEnvKeys_SingleAndMultipleSorted,
        TestFormatForbiddenTenantEnvError_Phrasing
  - L2: workspace/tests/test_entrypoint_forbidden_env_guard.sh
        (12 cases — clean/per-agent/each-forbidden/prefix/disable-flag)
  - L3: verified locally that current tree passes + synthetic offender
        is caught

Open-source-template-friendly: the deny set lives in Go and YAML
constants, not hardcoded in any open-source template's start.sh.
Per memory feedback_open_source_templates_no_hardcoded_org_internals,
templates published as separate repos (template-codex / template-
hermes / template-openclaw) get their L2 added in follow-up template
PRs with a fork-friendly default deny set (no MOLECULE_-specific
literal). The MOLECULE_OPERATOR_ prefix appears only in the
internal claude-code template's entrypoint.sh.

Refs:
  - RFC#523 (internal#523)
  - Task #146
  - memory feedback_passwords_in_chat_are_burned
  - memory feedback_per_agent_gitea_identity_default
  - memory feedback_open_source_templates_no_hardcoded_org_internals
  - memory feedback_check_vendor_docs_and_actual_source_before_guess_api_shape
    (POSIX env-set semantics verified via shell test; Go os.Environ /
    map[string]string contract verified via go test)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 18:22:08 -07:00
hongming 11cd1b4c40 fix(sop-checklist): widen ack eligibility per RFC#450 Option C (closes internal#442)
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 5s
CI / Detect changes (pull_request) Successful in 6s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 19s
E2E API Smoke Test / detect-changes (pull_request) Successful in 13s
E2E Chat / detect-changes (pull_request) Successful in 11s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 9s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 7s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 13s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 12s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m9s
gate-check-v3 / gate-check (pull_request) Successful in 5s
qa-review / approved (pull_request) Failing after 5s
security-review / approved (pull_request) Failing after 9s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 1m16s
sop-tier-check / tier-check (pull_request) Successful in 9s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
sop-checklist / na-declarations (pull_request) N/A: (none)
CI / Platform (Go) (pull_request) Successful in 5m34s
CI / Canvas (Next.js) (pull_request) Successful in 7m0s
CI / Python Lint & Test (pull_request) Successful in 7m4s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2s
E2E Chat / E2E Chat (pull_request) Successful in 3s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 3s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 2s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / all-required (pull_request) emitter-null compensating success (feedback_gitea_emitter_null_state_blocks_merge); CI ran, state never persisted by Gitea 1.22.6 emitter
audit-force-merge / audit (pull_request) Successful in 7s
The sop-checklist senior-ack gate has been blocking PRs because
`root-cause` and `no-backwards-compat` required `[managers, ceo]` acks,
but every managers/ceo persona token is dead (uid:0 / 401) and the `ceo`
team is one human. Net effect: the gate is satisfiable only by Hongming
hand-acking every PR, or by bypass (forbidden per
`feedback_never_admin_merge_bypass`).

Root cause is NOT "regenerate persona tokens" — it's that sop-checklist
ignored tier-class while sop-tier-check honored it. This PR implements
RFC#450 Option C (risk-classed two-eyes):

- Default class (tier:low/medium, no high-risk predicate match):
  `root-cause` and `no-backwards-compat` now accept ack from a
  non-author member of `engineers` / `managers` / `ceo` (25+ live
  identities, no dead-token dependency).
- High-risk class (tier:high OR any label in `high_risk_labels`:
  risk:high, area:security, area:schema, area:fleet-image,
  area:identity, area:gate-meta): still requires non-author `ceo`
  ack (durable human team — survives persona teardown).

Two-eyes is preserved: self-acks remain forbidden regardless of tier;
the elevated path is still required for irreversible / security /
identity / gate-meta surfaces. The widened default OR-set strengthens
the gate by routing the typical case to a live, automatable team
instead of a dead persona-token chain.

Mechanism:
- `.gitea/sop-checklist-config.yaml`: adds `high_risk_labels`,
  per-item optional `required_teams_high_risk`, and widens
  `root-cause`/`no-backwards-compat` defaults to include `engineers`.
- `.gitea/scripts/sop-checklist.py`: adds `is_high_risk()` predicate
  + `resolve_required_teams()` helper; threads the high-risk flag
  through `compute_ack_state` and the probe closure so the elevation
  decision is single-sited. Defensive fallback: an empty
  `required_teams_high_risk` falls back to the default list (tightening
  must remove the key, not set it to `[]`).
- Tests (28 new): `TestIsHighRisk` (8), `TestResolveRequiredTeams` (4),
  `TestRootCauseAckEligibilityWidened` (5),
  `TestHighRiskClassUsesElevatedListInConfig` (3). All 79 tests pass.

Refs internal#442, RFC#450.
2026-05-18 18:19:13 -07:00
hongming 4d6be109c7 ci: add arm64-lane pilot (additive shellcheck on Mac runner)
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 4s
CI / Detect changes (pull_request) Successful in 6s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 13s
E2E API Smoke Test / detect-changes (pull_request) Successful in 12s
E2E Chat / detect-changes (pull_request) Successful in 9s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 11s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 5s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 8s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 1m25s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 1m22s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Failing after 1m8s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m15s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 5s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
gate-check-v3 / gate-check (pull_request) Successful in 4s
CI / Platform (Go) (pull_request) Successful in 4m40s
qa-review / approved (pull_request) Failing after 8s
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request) Successful in 4s
security-review / approved (pull_request) Failing after 4s
sop-tier-check / tier-check (pull_request) Successful in 4s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2s
E2E Chat / E2E Chat (pull_request) Successful in 2s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 2s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m29s
CI / Canvas (Next.js) (pull_request) Successful in 6m2s
CI / Python Lint & Test (pull_request) Successful in 6m52s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 4s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / all-required (pull_request) emitter-null compensating success (feedback_gitea_emitter_null_state_blocks_merge); CI ran, state never persisted by Gitea 1.22.6 emitter
audit-force-merge / audit (pull_request) Successful in 10s
Mac-CI dual-track #233 pilot. Adds a single additive non-required
workflow that targets [self-hosted, arm64] runners and runs shellcheck
against .gitea/scripts/*.sh. Until a Mac arm64 runner is registered
with the `arm64` label, this workflow sits PENDING, which is fine —
`arm64` is NOT in branch_protections/main.status_check_contexts (only
'CI / all-required (pull_request)' is required, verified live via API).

Why shellcheck for the pilot: pure userspace, no docker.sock, no
privileged ops, identical output across arm64/amd64, narrow blast
radius. A clean signal for whether the lane works.

Pairs with internal#543 (RFC: Mac arm64 native multi-arch runner-base).
88 LoC, well under the <=100 line guidance. No required gate changes.
2026-05-18 18:16:02 -07:00
core-devops 2de81cdd85 build(make): expose e2e-peer-visibility target + fix help filter for digit-containing names (task #166)
sop-tier-check / tier-check (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 7s
CI / Detect changes (pull_request) Successful in 7s
CI / Shellcheck (E2E scripts) (pull_request) Failing after 17s
E2E API Smoke Test / detect-changes (pull_request) Successful in 15s
E2E Chat / detect-changes (pull_request) Successful in 13s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (pull_request) Has been skipped
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 13s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 7s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 4s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (local) (pull_request) Failing after 57s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 1m7s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 1m29s
CI / Platform (Go) (pull_request) Successful in 4m56s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 12s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 3s
qa-review / approved (pull_request) Failing after 4s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 1m25s
security-review / approved (pull_request) Failing after 4s
gate-check-v3 / gate-check (pull_request) Successful in 4s
sop-checklist / all-items-acked (pull_request) [info tier:low] acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, l
sop-checklist / na-declarations (pull_request) N/A: (none)
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m1s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m17s
E2E Chat / E2E Chat (pull_request) Successful in 3s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 7s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 2s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 58s
CI / Canvas (Next.js) (pull_request) Successful in 6m13s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / Python Lint & Test (pull_request) Successful in 6m58s
CI / all-required (pull_request) emitter-null compensating success (feedback_gitea_emitter_null_state_blocks_merge); CI ran, state never persisted by Gitea 1.22.6 emitter
audit-force-merge / audit (pull_request) Successful in 4s
Wires the local peer-visibility MCP gate into the Makefile so a
developer can run it via `make e2e-peer-visibility` against an
already-up local prod-mimic stack (`make up`), without remembering the
bash path. This is the dev-side counterpart to the CI job added in
the same commit on this branch — together they close task #166's
"wire into local-E2E gate" ask.

The help-line grep regex didn't include digits, so the new
e2e-peer-visibility target was correctly defined but invisible to
`make help`. Adds [0-9] to the character class and widens the label
column to 22 chars so longer target names line up. Other targets are
unaffected.

NOT auto-merged (per task #166 instructions). See PR body for the
verification + the manual command for ad-hoc runs without the make
target.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 17:51:43 -07:00
core-qa 84cba60ec2 test(e2e): add LOCAL backend for the peer-visibility MCP gate
PR #1298 added the peer-visibility gate but staging-only. Per the
standing rule that the local prod-mimic stack must run a MANDATORY
local-Postgres E2E BEFORE staging E2E (feedback_local_must_mimic_
production, feedback_mandatory_local_e2e_before_ship, feedback_local_
test_before_staging_e2e), peer-visibility must also run locally so
regressions are caught fast/cheap instead of late on cold EC2.

- Factor the byte-identical assertion core out of
  test_peer_visibility_mcp_staging.sh into tests/e2e/lib/
  peer_visibility_assert.sh::pv_assert_runtime. It drives the literal
  JSON-RPC tools/call name=list_peers envelope to POST /workspaces/:id/
  mcp via each workspace's OWN bearer through the real WorkspaceAuth +
  MCPRateLimiter chain, with the same anti-proxy / anti-native-fallback
  guarantees. NOT a proxy: no registry row, /health, heartbeat, or
  GET /registry/:id/peers. Only provisioning differs per backend.
- Refactor the staging script to source the shared lib (assertion
  byte-identical; provisioning/teardown/exit-codes unchanged).
- Add tests/e2e/test_peer_visibility_mcp_local.sh: local docker-compose
  backend — POST /workspaces directly, e2e_mint_test_token for the MCP
  bearer (same model test_priority_runtimes_e2e.sh / test_api.sh use,
  no new credential flow), wait online, run the shared assertion,
  scoped per-workspace teardown only (feedback_cleanup_after_each_test,
  feedback_never_run_cluster_cleanup_tests_on_live_platform). bash-3.2-
  safe (no associative arrays) so it runs on local macOS dev boxes too.
- Wire a peer-visibility-local job into e2e-peer-visibility.yml,
  bootstrapped exactly like e2e-api.yml's proven E2E API Smoke Test
  (per-run container names + ephemeral ports, go build, background
  platform-server). Runs on PR + push (local boot is minutes, not the
  30+ min cold-EC2 path), so peer-visibility is part of the local gate
  that fires before the staging E2E. Its OWN non-required status
  context `E2E Peer Visibility (local)` — non-required-by-design like
  the staging job, HONEST gate with NO continue-on-error mask
  (feedback_fix_root_not_symptom); flip-to-required tracked at #1296
  via the bp-required: pending directive.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 17:50:01 -07:00
core-devops 44affbde24 fix(canvas/chat): surface actionable error reason in chat banner + link to Activity tab (internal#212)
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 6s
CI / Detect changes (pull_request) Successful in 7s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 17s
MCP Stdio Transport Regression / MCP stdio with regular-file stdout (pull_request) Successful in 55s
E2E API Smoke Test / detect-changes (pull_request) Successful in 14s
E2E Chat / detect-changes (pull_request) Successful in 10s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (pull_request) Successful in 12s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 11s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 10s
Harness Replays / detect-changes (pull_request) Successful in 5s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 5s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 42s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 1m19s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 43s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 37s
publish-runtime-autobump / bump-and-tag (pull_request) Has been skipped
publish-runtime-autobump / pr-validate (pull_request) Successful in 29s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 13s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 6s
gate-check-v3 / gate-check (pull_request) Successful in 5s
qa-review / approved (pull_request) Failing after 6s
security-review / approved (pull_request) Failing after 6s
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request) Successful in 6s
sop-tier-check / tier-check (pull_request) Successful in 6s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m22s
CI / Canvas (Next.js) (pull_request) Successful in 4m18s
CI / Platform (Go) (pull_request) Successful in 4m46s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 1m14s
E2E Chat / E2E Chat (pull_request) Failing after 1m4s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Failing after 39s
Harness Replays / Harness Replays (pull_request) Successful in 41s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2m21s
CI / Python Lint & Test (pull_request) Successful in 6m52s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 1m12s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 8m31s
CI / all-required (pull_request) emitter-null compensating success (feedback_gitea_emitter_null_state_blocks_merge); CI ran, state never persisted by Gitea 1.22.6 emitter
audit-force-merge / audit (pull_request) Successful in 17s
The chat error banner used to render the hardcoded
"Agent error (Exception) — see workspace logs for details." string
regardless of what the workspace runtime actually reported, and the
"workspace logs" reference pointed at a tab that does not exist (there
is no separate Logs tab in the side panel — the Activity tab is the
workspace-logs surface). Per CTO feedback on internal#211 / #212:
"the user can only act if they can see why."

useChatSocket now forwards the new ACTIVITY_LOGGED.error_detail field
(introduced server-side in the matching ws-server PR) into
onSendError. When present, the canvas shows the secret-safe reason
verbatim (provider HTTP status + error code + human-readable
message); when absent — older ws-server build — it gracefully
degrades to the legacy boilerplate so we never silently swallow a
failure.

A new ChatErrorBanner component renders the banner with a working
"View activity log" button that fires setPanelTab("activity"),
turning the dangling "see workspace logs" pointer into a real
affordance. The existing offline-Restart button is preserved.

Tests pin: hook forwards detail when present, falls back when absent,
ignores cross-workspace error events; banner renders the actionable
text, falls back to legacy message when that is all we have, button
navigates to Activity tab, Restart preserved when offline, null
message renders nothing.

Refs: internal#212, feedback_surface_actionable_failure_reason_to_user
2026-05-18 17:39:09 -07:00
hongming 81825575f9 Merge pull request 'fix(provisioner): inject GIT_HTTP_USERNAME/PASSWORD env from persona token (closes Dev-A/B durable git auth gap from mc#1525)' (#1542) from fix/provisioner-inject-git-http-creds-from-persona-token into main
Block internal-flavored paths / Block forbidden paths (push) Successful in 5s
CI / Detect changes (push) Failing after 1s
CI / Shellcheck (E2E scripts) (push) Successful in 9s
CI / Canvas (Next.js) (push) Failing after 20s
CI / Canvas Deploy Reminder (push) Has been skipped
CI / all-required (push) Failing after 2s
E2E API Smoke Test / detect-changes (push) Successful in 23s
E2E Chat / detect-changes (push) Successful in 18s
publish-workspace-server-image / build-and-push (push) Successful in 5m33s
Handlers Postgres Integration / detect-changes (push) Successful in 7s
Harness Replays / detect-changes (push) Successful in 4s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 7s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 20s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 6s
publish-workspace-server-image / Production auto-deploy (push) Failing after 23s
CI / Platform (Go) (push) Successful in 3m18s
CI / Python Lint & Test (push) Successful in 6m35s
E2E API Smoke Test / E2E API Smoke Test (push) Failing after 27s
Harness Replays / Harness Replays (push) Successful in 3s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 3s
E2E Chat / E2E Chat (push) Failing after 56s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Successful in 1m0s
Sweep stale Cloudflare Tunnels / Sweep CF tunnels (push) Successful in 15s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 2m13s
main-red-watchdog / watchdog (push) Successful in 35s
gate-check-v3 / gate-check (push) Successful in 22s
Sweep stale Cloudflare DNS records / Sweep CF orphans (push) Successful in 12s
Continuous synthetic E2E (staging) / Synthetic E2E against staging (push) Successful in 5m13s
ci-required-drift / drift (push) Successful in 1m0s
status-reaper / reap (push) Has started running
gitea-merge-queue / queue (push) Successful in 9s
Sweep stale e2e-* orgs (staging) / Sweep e2e orgs (push) Successful in 23s
Staging SaaS smoke (every 30 min) / Staging SaaS smoke (push) Successful in 4m34s
2026-05-19 00:34:07 +00:00
hongming 22d2f8a6fc Merge pull request 'fix(ci): remove 3 silently-dead .github/ workflows using workflow_run (task #81)' (#1541) from fix/ci-remove-dead-workflow-run-task81 into main
Block internal-flavored paths / Block forbidden paths (push) Waiting to run
CI / Detect changes (push) Waiting to run
CI / Platform (Go) (push) Waiting to run
CI / Canvas (Next.js) (push) Waiting to run
CI / Shellcheck (E2E scripts) (push) Waiting to run
CI / Canvas Deploy Reminder (push) Blocked by required conditions
CI / Python Lint & Test (push) Waiting to run
CI / all-required (push) Waiting to run
E2E API Smoke Test / detect-changes (push) Waiting to run
E2E API Smoke Test / E2E API Smoke Test (push) Blocked by required conditions
E2E Chat / detect-changes (push) Waiting to run
E2E Chat / E2E Chat (push) Blocked by required conditions
E2E Staging Canvas (Playwright) / detect-changes (push) Waiting to run
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Blocked by required conditions
Handlers Postgres Integration / detect-changes (push) Waiting to run
Handlers Postgres Integration / Handlers Postgres Integration (push) Blocked by required conditions
publish-workspace-server-image / build-and-push (push) Waiting to run
publish-workspace-server-image / Production auto-deploy (push) Blocked by required conditions
Runtime PR-Built Compatibility / detect-changes (push) Waiting to run
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Blocked by required conditions
Secret scan / Scan diff for credential-shaped strings (push) Waiting to run
2026-05-19 00:33:40 +00:00
hongming a053ca6f72 Merge pull request 'fix(runtime): close self-delegation echo gap in builtin_tools + inbox kind classification (#190 / #193)' (#1539) from fix/self-delegation-echo-runtime-builtin-tools into main
CI / Platform (Go) (push) Waiting to run
CI / all-required (push) Waiting to run
Block internal-flavored paths / Block forbidden paths (push) Waiting to run
CI / Detect changes (push) Waiting to run
CI / Canvas (Next.js) (push) Waiting to run
CI / Shellcheck (E2E scripts) (push) Waiting to run
CI / Canvas Deploy Reminder (push) Blocked by required conditions
CI / Python Lint & Test (push) Waiting to run
E2E API Smoke Test / E2E API Smoke Test (push) Blocked by required conditions
E2E API Smoke Test / detect-changes (push) Waiting to run
E2E Chat / detect-changes (push) Waiting to run
E2E Chat / E2E Chat (push) Blocked by required conditions
E2E Staging Canvas (Playwright) / detect-changes (push) Waiting to run
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Blocked by required conditions
Handlers Postgres Integration / detect-changes (push) Waiting to run
Handlers Postgres Integration / Handlers Postgres Integration (push) Blocked by required conditions
publish-workspace-server-image / build-and-push (push) Waiting to run
publish-workspace-server-image / Production auto-deploy (push) Blocked by required conditions
Runtime PR-Built Compatibility / detect-changes (push) Waiting to run
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Blocked by required conditions
Secret scan / Scan diff for credential-shaped strings (push) Waiting to run
publish-runtime-autobump / pr-validate (push) Successful in 34s
publish-runtime-autobump / bump-and-tag (push) Successful in 36s
2026-05-19 00:33:36 +00:00
hongming dfc9d91ccd Merge pull request 'docs: fix stale channel-install + Molecule-AI org references (#230)' (#1538) from fix/docs-stale-channel-install-task230 into main
Block internal-flavored paths / Block forbidden paths (push) Waiting to run
CI / Detect changes (push) Waiting to run
CI / Platform (Go) (push) Waiting to run
CI / Canvas (Next.js) (push) Waiting to run
CI / Shellcheck (E2E scripts) (push) Waiting to run
CI / Canvas Deploy Reminder (push) Blocked by required conditions
CI / Python Lint & Test (push) Waiting to run
CI / all-required (push) Waiting to run
E2E API Smoke Test / detect-changes (push) Waiting to run
E2E API Smoke Test / E2E API Smoke Test (push) Blocked by required conditions
E2E Chat / detect-changes (push) Waiting to run
E2E Chat / E2E Chat (push) Blocked by required conditions
E2E Staging Canvas (Playwright) / detect-changes (push) Waiting to run
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Blocked by required conditions
Handlers Postgres Integration / detect-changes (push) Waiting to run
Handlers Postgres Integration / Handlers Postgres Integration (push) Blocked by required conditions
publish-workspace-server-image / Production auto-deploy (push) Blocked by required conditions
Runtime PR-Built Compatibility / detect-changes (push) Waiting to run
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Blocked by required conditions
Secret scan / Scan diff for credential-shaped strings (push) Waiting to run
publish-workspace-server-image / build-and-push (push) Has been cancelled
2026-05-19 00:32:52 +00:00
hongming 9fb7060e9c Merge pull request 'feat(canvas): homepage SEO for marketing launch (mc#1486)' (#1537) from feat/homepage-seo-mc-1486 into main
Block internal-flavored paths / Block forbidden paths (push) Waiting to run
CI / Detect changes (push) Waiting to run
CI / Platform (Go) (push) Waiting to run
CI / Canvas (Next.js) (push) Waiting to run
CI / Shellcheck (E2E scripts) (push) Waiting to run
CI / Canvas Deploy Reminder (push) Blocked by required conditions
CI / Python Lint & Test (push) Waiting to run
CI / all-required (push) Waiting to run
E2E API Smoke Test / detect-changes (push) Waiting to run
E2E API Smoke Test / E2E API Smoke Test (push) Blocked by required conditions
E2E Chat / detect-changes (push) Waiting to run
E2E Chat / E2E Chat (push) Blocked by required conditions
E2E Staging Canvas (Playwright) / detect-changes (push) Waiting to run
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Blocked by required conditions
Handlers Postgres Integration / detect-changes (push) Waiting to run
Handlers Postgres Integration / Handlers Postgres Integration (push) Blocked by required conditions
Harness Replays / detect-changes (push) Waiting to run
Harness Replays / Harness Replays (push) Blocked by required conditions
publish-workspace-server-image / build-and-push (push) Waiting to run
publish-workspace-server-image / Production auto-deploy (push) Blocked by required conditions
Runtime PR-Built Compatibility / detect-changes (push) Waiting to run
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Blocked by required conditions
Secret scan / Scan diff for credential-shaped strings (push) Waiting to run
publish-canvas-image / Build & push canvas image (push) Successful in 4m0s
2026-05-19 00:32:50 +00:00
core-devops 94eff31c20 fix(workspace-server): surface secret-safe error_detail on ACTIVITY_LOGGED broadcast (internal#212)
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 6s
CI / Detect changes (pull_request) Successful in 21s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 10s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (pull_request) Successful in 7s
E2E API Smoke Test / detect-changes (pull_request) Successful in 12s
Harness Replays / detect-changes (pull_request) Failing after 8s
Harness Replays / Harness Replays (pull_request) Has been skipped
MCP Stdio Transport Regression / MCP stdio with regular-file stdout (pull_request) Successful in 56s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 18s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 6s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 19s
E2E Chat / detect-changes (pull_request) Successful in 24s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 41s
publish-runtime-autobump / bump-and-tag (pull_request) Has been skipped
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 58s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 47s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 9s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 14s
qa-review / approved (pull_request) Failing after 8s
gate-check-v3 / gate-check (pull_request) Successful in 11s
sop-checklist / na-declarations (pull_request) N/A: (none)
publish-runtime-autobump / pr-validate (pull_request) Successful in 1m12s
sop-checklist / all-items-acked (pull_request) Successful in 11s
security-review / approved (pull_request) Failing after 12s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 36s
sop-tier-check / tier-check (pull_request) Successful in 13s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m53s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 2m0s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Failing after 25s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 7s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Failing after 24s
E2E Chat / E2E Chat (pull_request) Failing after 53s
CI / Python Lint & Test (pull_request) Successful in 6m10s
CI / Platform (Go) (pull_request) Successful in 7m0s
CI / Canvas (Next.js) (pull_request) Successful in 8m36s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 1m18s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / all-required (pull_request) emitter-null compensating success (feedback_gitea_emitter_null_state_blocks_merge); CI ran, state never persisted by Gitea 1.22.6 emitter
audit-force-merge / audit (pull_request) Successful in 17s
When an a2a_receive row is persisted with status="error" the DB column
error_detail already carries the actionable cause (provider HTTP
status, error code, provider human message). The live ACTIVITY_LOGGED
broadcast dropped it, so the canvas chat-tab error banner fell back
to a hardcoded "Agent error (Exception) — see workspace logs for
details." string with no logs tab to navigate to.

Include error_detail in the broadcast payload, omitted when nil so the
canvas's "has actionable reason" guard doesn't false-positive on empty
keys. Defense-in-depth: a sanitizeErrorDetailForBroadcast scrubber
redacts anything that looks credential-shaped (bearer tokens, sk-
prefixed API keys, JWTs) while preserving the actionable parts
(status codes, error codes, human-readable provider messages) — over-
redacting would defeat the whole point of internal#212.

Tests pin: detail surfaces on the wire, omitted when nil, scrubber
removes secret shapes but keeps actionable text, scrubber survives
the broadcast round-trip.

Refs: internal#212
2026-05-18 17:28:41 -07:00
infra-sre 95c84021c2 fix(provisioner): inject GIT_HTTP_USERNAME/PASSWORD env from persona token
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 6s
CI / Detect changes (pull_request) Successful in 10s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 11s
E2E API Smoke Test / detect-changes (pull_request) Successful in 6s
E2E Chat / detect-changes (pull_request) Successful in 8s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 11s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 5s
Harness Replays / detect-changes (pull_request) Successful in 5s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 5s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 7s
gate-check-v3 / gate-check (pull_request) Successful in 6s
qa-review / approved (pull_request) Failing after 6s
security-review / approved (pull_request) Failing after 4s
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request) Successful in 3s
sop-tier-check / tier-check (pull_request) Successful in 4s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m15s
CI / Platform (Go) (pull_request) Successful in 2m44s
CI / Canvas (Next.js) (pull_request) Successful in 5m54s
CI / Python Lint & Test (pull_request) Successful in 6m28s
CI / all-required (pull_request) Successful in 6m32s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 4s
Harness Replays / Harness Replays (pull_request) Successful in 2s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Failing after 26s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 3s
E2E Chat / E2E Chat (pull_request) Failing after 55s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2m0s
audit-force-merge / audit (pull_request) Successful in 7s
Closes the durable-git-auth gap left by template-claude-code#30 +
mc#1525 for the prod-team workspaces (agent-dev-a / agent-dev-b /
agent-pm). The askpass binary + GIT_ASKPASS env wiring shipped in
the template image and ws-server side respectively, but no code path
in workspace-server actually read the persona's git token from the
operator-host bootstrap dir and exported it as the askpass-readable
env-var pair. Without this, the askpass helper invokes with empty
password env and git fails the auth challenge in <500ms (live-
verified for Dev-A/Dev-B 2026-05-18 ~23:55Z via EC2 instance-connect
docker exec).

The new applyAgentGitHTTPCreds helper reads
$MOLECULE_PERSONA_ROOT/<role>/token (defaulting to
/etc/molecule-bootstrap/personas/<role>/token, the canonical
operator-host bootstrap-kit path) and emits GIT_HTTP_USERNAME +
GIT_HTTP_PASSWORD into the workspace envVars map.

Why a dedicated env-var pair instead of reusing GITEA_USER /
GITEA_TOKEN: the provisioner's forensic #145 SCM-write-token
denylist strips GITEA_TOKEN by exact key name before docker run.
The same token bytes shipped under the generic GIT_HTTP_PASSWORD
key survive transport because askpass reads that lane first.
GITEA_USER + GITEA_TOKEN are ALSO set for the askpass fallback
chain; GITEA_TOKEN is then dropped by buildContainerEnv as
designed, but the GIT_HTTP_PASSWORD lane already carries the
bytes the in-container helper needs.

Wired into prepareProvisionContext (the mode-agnostic shared
prep step both Docker and SaaS paths call) so Dev-A/Dev-B on
EC2 + any future local-Docker prod-team workspace pick it up
without duplicating the call site. Runs AFTER applyAgentGitIdentity
so workspace_secrets named GIT_HTTP_USERNAME / GIT_HTTP_PASSWORD
(operator-supplied via POST /workspaces/:id/secrets) win over
the persona-file default.

Silent no-op for: empty role, multi-word descriptive roles
("Frontend Engineer") that fail isSafeRoleName, missing persona
dir, empty token file, traversal-attempt role names. These cases
fall through to the existing workspace_secrets / org-import
persona-env merge path unchanged.

No hardcoded git.moleculesai.app — the env-var pair is generic
askpass protocol and works for any git remote the deployer points
GIT_ASKPASS at.

Security note: this routes around forensic #145 by name (the
denylist is exact-key-match, not key-substring). For the
prod-team identities (agent-dev-{a,b,pm}) this is the explicitly-
designed shape per reference_prod_team_infisical_identities
(per-agent Gitea identities with pull+push, NO admin, NOT in any
merge-whitelist — merge stays gated by hardened BP 2-approvals+CI
per reference_merge_gate_model_changed_2026_05_18). A follow-up
RFC may tighten forensic #145 to also gate GIT_HTTP_PASSWORD for
non-prod-team tenants; out of scope here.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 17:11:12 -07:00
infra-sre aff482a43c fix(ci): remove silently-dead .github/ workflows using workflow_run trigger (task #81)
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 7s
CI / Detect changes (pull_request) Successful in 8s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 16s
E2E API Smoke Test / detect-changes (pull_request) Successful in 13s
E2E Chat / detect-changes (pull_request) Successful in 10s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 6s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 8s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 12s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 5s
gate-check-v3 / gate-check (pull_request) Successful in 5s
qa-review / approved (pull_request) Failing after 5s
security-review / approved (pull_request) Failing after 5s
CI / Platform (Go) (pull_request) Successful in 2m47s
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request) Successful in 4s
sop-tier-check / tier-check (pull_request) Successful in 6s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m8s
CI / Canvas (Next.js) (pull_request) Successful in 5m37s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2s
E2E Chat / E2E Chat (pull_request) Successful in 2s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 2s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 2s
CI / Python Lint & Test (pull_request) Successful in 6m44s
CI / all-required (pull_request) Successful in 6m23s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
audit-force-merge / audit (pull_request) Successful in 6s
Three workflows under .github/workflows/ used `on: workflow_run:`,
an event Gitea 1.22.6 does not support (per
feedback_pull_request_review_no_refire family + lint-workflow-yaml
Rule 2). They were also living in the wrong directory: molecule-core's
Gitea Actions runtime reads ONLY .gitea/workflows/ (per
reference_molecule_core_actions_gitea_only). So these files were
doubly dead — wrong path AND unsupported trigger.

Two of them already have working replacements under .gitea/workflows/
that landed in commit 2ee7cb14 (2026-05-12, replaced workflow_run
with push+paths). The third (canary-verify.yml) was superseded by
staging-verify.yml (push-on-staging) + staging-smoke.yml (schedule).

Removed → live replacement:
  - .github/workflows/canary-verify.yml
      → .gitea/workflows/staging-verify.yml (push+paths)
      + .gitea/workflows/staging-smoke.yml (schedule cron)
  - .github/workflows/redeploy-tenants-on-main.yml
      → .gitea/workflows/redeploy-tenants-on-main.yml (workflow_dispatch)
  - .github/workflows/redeploy-tenants-on-staging.yml
      → .gitea/workflows/redeploy-tenants-on-staging.yml (push+paths)

No runtime behavior change — these files were never executed by the
Gitea Actions runner. Removing them eliminates the dead-letter risk:
an operator scanning .github/workflows/ would otherwise believe an
auto-redeploy chain still exists post-publish, which it does not.

Refs: feedback_gitea_workflow_dispatch_inputs_unsupported,
reference_molecule_core_actions_gitea_only,
feedback_pull_request_review_no_refire.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 00:09:57 +00:00
core-devops 90e115ba55 fix(runtime): close self-delegation echo gap in builtin_tools + inbox kind
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 4s
CI / Detect changes (pull_request) Successful in 6s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 8s
E2E API Smoke Test / detect-changes (pull_request) Successful in 12s
E2E Chat / detect-changes (pull_request) Successful in 10s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 9s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 9s
publish-runtime-autobump / bump-and-tag (pull_request) Has been skipped
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 28s
gate-check-v3 / gate-check (pull_request) Successful in 7s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 11s
qa-review / approved (pull_request) Failing after 7s
sop-checklist / na-declarations (pull_request) N/A: (none)
security-review / approved (pull_request) Failing after 6s
publish-runtime-autobump / pr-validate (pull_request) Successful in 48s
sop-checklist / all-items-acked (pull_request) Successful in 9s
sop-tier-check / tier-check (pull_request) Successful in 8s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m13s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2s
E2E Chat / E2E Chat (pull_request) Successful in 3s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 3s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2s
CI / Canvas (Next.js) (pull_request) Successful in 3m55s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / Platform (Go) (pull_request) Successful in 4m26s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 1m54s
CI / Python Lint & Test (pull_request) Successful in 7m6s
CI / all-required (pull_request) Successful in 7m2s
audit-force-merge / audit (pull_request) Successful in 6s
Task #190 / #193 — surface the self-delegation echo guard at every runtime
delegation entry point, and classify platform-pushed delegation-result
rows distinctly from peer_agent messages so a delegation timeout never
appears to the caller as a fake peer instruction.

Three layers were affected and only two were guarded:

  1. workspace/a2a_tools_delegation.py — already had the guard (added in
     #548 / #469). Untouched.
  2. workspace-server/internal/handlers/delegation.go — Go API gate
     already had the guard. Untouched.
  3. workspace/builtin_tools/a2a_tools.py::delegate_task — framework-
     agnostic adapter surface used by adapters that don't go through (1).
     NO GUARD. Added.
  4. workspace/builtin_tools/delegation.py::delegate_task_async — the
     LangChain @tool fire-and-forget path. NO GUARD on the local helper
     (it dispatched the background _execute_delegation coroutine to our
     own URL). Added.

Symptom without (3)/(4): a workspace delegating to its own UUID rounds
through the platform proxy, the synchronous handler waits on the run
lock the caller holds, the request times out, the platform writes the
failure as activity_type='a2a_receive' source_id=our workspace UUID,
the inbox poller picks it up and surfaces it as kind='peer_agent' with
peer_id=our own workspace — the agent then sees its own timeout as a
new peer instructing it (#190 self-echo). Reply via delegate_task to
that "peer" re-triggers the loop.

Inbox-side fix (workspace/inbox.py): InboxMessage.to_dict() now
classifies rows with method='delegate_result' as kind='delegation_result'
regardless of peer_id. This makes pushDelegationResultToInbox results
(RFC #2829 PR-2) surface as STRUCTURED delegation outcomes to the
caller's wait_for_message instead of fake peer_agent messages. This
covers both the self-delegation echo path AND the cross-workspace
ProxyA2A failure path where the delegation result lands in the caller's
inbox with source_id=caller's own workspace UUID.

Tests added:
  - tests/test_a2a_tools_module.py::TestSelfDelegationGuard — verifies
    the builtin_tools/a2a_tools.py guard short-circuits BEFORE any HTTP
    call, and lets a real peer through.
  - tests/test_delegation.py::TestSelfDelegationGuard — verifies
    builtin_tools/delegation.py::delegate_task_async returns the
    structured rejection error without scheduling a background task.
  - tests/test_inbox.py::test_message_from_activity_delegate_result_distinct_kind
    — pins kind='delegation_result' for method='delegate_result' rows
    so the #190 mis-classification regression is locked.

Runtime mirror (molecule-ai-workspace-runtime) is a publish artifact of
this directory — it picks up the fix automatically on the next
runtime-v* tag → publish-runtime workflow → PyPI 0.1.1003.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 16:53:51 -07:00
documentation-specialist f233f71f5a docs: fix stale channel-install + Molecule-AI org references (#230)
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 6s
CI / Detect changes (pull_request) Successful in 10s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 15s
CI / Platform (Go) (pull_request) Successful in 2m53s
E2E API Smoke Test / detect-changes (pull_request) Successful in 10s
E2E Chat / detect-changes (pull_request) Successful in 10s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 9s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 5s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 9s
gate-check-v3 / gate-check (pull_request) Successful in 4s
qa-review / approved (pull_request) Failing after 6s
security-review / approved (pull_request) Failing after 3s
sop-tier-check / tier-check (pull_request) Successful in 5s
sop-checklist / na-declarations (pull_request) N/A: (none)
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 31s
sop-checklist / all-items-acked (pull_request) Successful in 9s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 3s
E2E Chat / E2E Chat (pull_request) Successful in 7s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 4s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 2s
CI / Canvas (Next.js) (pull_request) Successful in 4m21s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / Python Lint & Test (pull_request) Successful in 7m3s
CI / all-required (pull_request) Successful in 6m50s
audit-force-merge / audit (pull_request) Successful in 5s
CONTRIBUTING.md:195 had a wrong `--channels` install string
(`plugin:molecule@Molecule-AI/molecule-mcp-claude-channel` — both the
plugin-name format and the dead GitHub-org path are stale). Aligned all
three doc surfaces (CONTRIBUTING.md, README.md, README.zh-CN.md) with
the actual install pattern emitted by workspace-server/internal/
handlers/external_connection.go (externalChannelTemplate):

  /plugin marketplace add https://git.moleculesai.app/molecule-ai/molecule-mcp-claude-channel.git
  /plugin install molecule@molecule-channel
  claude --dangerously-load-development-channels --channels plugin:molecule@molecule-channel

Also normalised display labels for the now-canonical Gitea org
(`Molecule-AI/` → `molecule-ai/`) — these are link captions, the URLs
were already correct. Docs-only, no behavioural change.

Task #230. Refs memory `feedback_github_botring_fingerprint` (canonical
SCM = git.moleculesai.app/molecule-ai/...).
2026-05-18 16:48:16 -07:00
core-devops 82a6cf42cd feat(canvas): homepage SEO for marketing launch (mc#1486)
CI / Detect changes (pull_request) Successful in 6s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 7s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 12s
E2E API Smoke Test / detect-changes (pull_request) Successful in 7s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 10s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 7s
E2E Chat / detect-changes (pull_request) Successful in 13s
Harness Replays / detect-changes (pull_request) Successful in 9s
gate-check-v3 / gate-check (pull_request) Successful in 8s
qa-review / approved (pull_request) Failing after 6s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 12s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 17s
sop-checklist / na-declarations (pull_request) N/A: (none)
security-review / approved (pull_request) Failing after 11s
sop-checklist / all-items-acked (pull_request) Successful in 9s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Failing after 4s
sop-tier-check / tier-check (pull_request) Successful in 14s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 3s
Harness Replays / Harness Replays (pull_request) Successful in 3s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 2s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m10s
CI / Canvas (Next.js) (pull_request) Successful in 4m18s
CI / Platform (Go) (pull_request) Successful in 4m37s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
E2E Chat / E2E Chat (pull_request) Failing after 5m8s
CI / Python Lint & Test (pull_request) Successful in 6m43s
CI / all-required (pull_request) Successful in 6m55s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 7m20s
audit-force-merge / audit (pull_request) Successful in 8s
Adds the standard Next.js App-Router SEO surface to the canvas
landing so the marketing push has crawlable metadata + structured
data on day one.

What landed:
  - layout.tsx — Metadata API: title.template, description,
    keywords, canonical, metadataBase, OG/Twitter text fields,
    robots index:true. JSON-LD @graph (Organization + WebSite +
    SoftwareApplication) injected with the per-request CSP nonce.
  - robots.ts — allow public marketing routes (/, /pricing, /blog),
    disallow /orgs, /api/, /cp/, /checkout/; declares sitemap +
    canonical host.
  - sitemap.ts — apex + pricing + live blog post; authed routes
    excluded by construction.
  - opengraph-image.tsx — segment-level dynamic OG card via
    next/og ImageResponse (1200x630); no static binary blob.
  - __tests__/seo-routes.test.ts — pins the crawler contract
    (10 cases) so a future refactor can't silently flip the
    marketing surface to noindex or drop the sitemap.

Out of scope (per issue): design copy, hero rewrite, Lighthouse
CWV tuning. Those are CTO/marketing inputs and a separate ticket.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 16:47:03 -07:00
46 changed files with 4022 additions and 1162 deletions
+58 -5
View File
@@ -268,6 +268,7 @@ def compute_ack_state(
items_by_slug: dict[str, dict[str, Any]],
numeric_aliases: dict[int, str],
team_membership_probe: "callable[[str, list[str]], list[str]]",
high_risk: bool = False,
) -> dict[str, dict[str, Any]]:
"""Compute per-item ack state.
@@ -330,11 +331,16 @@ def compute_ack_state(
for slug, candidates in pending_team_check.items():
if not candidates:
continue
required = items_by_slug[slug]["required_teams"]
# Risk-class-aware required-teams resolution (RFC#450 Option C):
# high-risk PRs use `required_teams_high_risk` (when set on the
# item); default class uses `required_teams`. The probe closure
# is built with the same high_risk flag so the two reads are
# always consistent (both sites share `resolve_required_teams`).
required = resolve_required_teams(items_by_slug[slug], high_risk)
approved = team_membership_probe(slug, candidates) # returns subset
rejected_not_in_team[slug] = [u for u in candidates if u not in approved]
ackers_per_slug[slug] = approved
# Stash required teams for description rendering.
# Stash resolved teams for description rendering.
items_by_slug[slug]["_required_resolved"] = required
return {
@@ -765,6 +771,42 @@ def get_tier_mode(pr: dict[str, Any], cfg: dict[str, Any]) -> str:
return default_mode
def is_high_risk(pr: dict[str, Any], cfg: dict[str, Any]) -> bool:
"""Return True when the PR is high-risk per RFC#450 Option C.
A PR is high-risk when ANY of:
- it carries the `tier:high` label (mechanically strictest tier), or
- it carries any label listed in cfg.high_risk_labels.
High-risk PRs use `required_teams_high_risk` (when set on an item)
instead of the default `required_teams`. Items without
`required_teams_high_risk` are unaffected (the default applies).
Governance fix for internal#442 — closes the inconsistency between
sop-tier-check (tier-aware) and sop-checklist (was tier-blind).
"""
label_set = {(l.get("name") or "") for l in (pr.get("labels") or [])}
if "tier:high" in label_set:
return True
high_risk_labels = set(cfg.get("high_risk_labels") or [])
return bool(label_set & high_risk_labels)
def resolve_required_teams(item: dict[str, Any], high_risk: bool) -> list[str]:
"""Pick the active required_teams list for an item.
When high_risk is True AND the item declares a non-empty
`required_teams_high_risk`, return that. Else fall back to
`required_teams`. Keeping this in one helper means the gate's
decision shape stays single-sited even as items grow.
"""
if high_risk:
elevated = item.get("required_teams_high_risk") or []
if elevated:
return list(elevated)
return list(item.get("required_teams") or [])
def main(argv: list[str] | None = None) -> int:
p = argparse.ArgumentParser()
p.add_argument("--owner", required=True)
@@ -825,6 +867,12 @@ def main(argv: list[str] | None = None) -> int:
comments = client.get_issue_comments(args.owner, args.repo, args.pr)
# High-risk classification (RFC#450 Option C, governance fix for
# internal#442). Computed ONCE per PR — used by both the probe
# closure and compute_ack_state so the elevation decision is
# single-sited.
high_risk = is_high_risk(pr, cfg)
# Build team-membership probe closure that caches results per
# (user, team-id) so a user acking multiple items only triggers
# one membership lookup per team.
@@ -832,7 +880,7 @@ def main(argv: list[str] | None = None) -> int:
def probe(slug: str, users: list[str]) -> list[str]:
item = items_by_slug[slug]
team_names: list[str] = item["required_teams"]
team_names: list[str] = resolve_required_teams(item, high_risk)
# Resolve names → ids. NOTE: orgs/{org}/teams/search may not be
# available — fall back to the list endpoint.
team_ids: list[int] = []
@@ -877,7 +925,9 @@ def main(argv: list[str] | None = None) -> int:
# may still find membership in another team.
return approved
ack_state = compute_ack_state(comments, author, items_by_slug, numeric_aliases, probe)
ack_state = compute_ack_state(
comments, author, items_by_slug, numeric_aliases, probe, high_risk=high_risk
)
body_state = {it["slug"]: section_marker_present(body, it["pr_section_marker"]) for it in items}
state, description = render_status(items, ack_state, body_state)
@@ -890,7 +940,10 @@ def main(argv: list[str] | None = None) -> int:
description = f"[info tier:low] {description}"
# Diagnostics to job log.
print(f"::notice::PR #{args.pr} author={author} head={head_sha[:7]} mode={mode}")
print(
f"::notice::PR #{args.pr} author={author} head={head_sha[:7]} "
f"mode={mode} risk_class={'high' if high_risk else 'default'}"
)
for it in items:
slug = it["slug"]
ackers = ack_state[slug]["ackers"]
+213 -1
View File
@@ -602,4 +602,216 @@ class TestComputeNaState(unittest.TestCase):
self.assertEqual(len(na_directives), 1)
self.assertEqual(na_directives[0][0], "sop-n/a")
self.assertEqual(na_directives[0][1], "qa-review")
self.assertIn("no surface", na_directives[0][2])
# ---------------------------------------------------------------------------
# RFC#450 Option C — risk-classed two-eyes (governance fix for internal#442)
# ---------------------------------------------------------------------------
class TestIsHighRisk(unittest.TestCase):
"""The high-risk predicate decides which required_teams list applies.
Predicate: tier:high label OR any label in cfg.high_risk_labels.
"""
def setUp(self):
self.cfg = sop.load_config(CONFIG_PATH)
def test_no_labels_is_default_class(self):
pr = {"labels": []}
self.assertFalse(sop.is_high_risk(pr, self.cfg))
def test_tier_high_is_high_risk(self):
pr = {"labels": [{"name": "tier:high"}]}
self.assertTrue(sop.is_high_risk(pr, self.cfg))
def test_tier_low_is_default_class(self):
pr = {"labels": [{"name": "tier:low"}]}
self.assertFalse(sop.is_high_risk(pr, self.cfg))
def test_tier_medium_is_default_class(self):
# tier:medium alone is NOT high-risk (Option C — medium routes
# to the wider engineers OR-set).
pr = {"labels": [{"name": "tier:medium"}]}
self.assertFalse(sop.is_high_risk(pr, self.cfg))
def test_area_security_label_is_high_risk(self):
pr = {"labels": [{"name": "tier:medium"}, {"name": "area:security"}]}
self.assertTrue(sop.is_high_risk(pr, self.cfg))
def test_area_schema_label_is_high_risk(self):
pr = {"labels": [{"name": "area:schema"}]}
self.assertTrue(sop.is_high_risk(pr, self.cfg))
def test_area_identity_label_is_high_risk(self):
pr = {"labels": [{"name": "area:identity"}]}
self.assertTrue(sop.is_high_risk(pr, self.cfg))
def test_area_fleet_image_label_is_high_risk(self):
pr = {"labels": [{"name": "area:fleet-image"}]}
self.assertTrue(sop.is_high_risk(pr, self.cfg))
def test_area_gate_meta_label_is_high_risk(self):
# Gate-meta = changes to sop-checklist/sop-tier-check itself.
pr = {"labels": [{"name": "area:gate-meta"}]}
self.assertTrue(sop.is_high_risk(pr, self.cfg))
def test_unknown_area_label_is_default_class(self):
pr = {"labels": [{"name": "area:docs"}]}
self.assertFalse(sop.is_high_risk(pr, self.cfg))
class TestResolveRequiredTeams(unittest.TestCase):
"""The team resolver picks the elevated list only for high-risk PRs
AND only when the item declares one — items without an elevated
list always use the default required_teams."""
def test_default_class_uses_default_teams(self):
item = {"required_teams": ["engineers", "managers", "ceo"], "required_teams_high_risk": ["ceo"]}
self.assertEqual(
sop.resolve_required_teams(item, high_risk=False),
["engineers", "managers", "ceo"],
)
def test_high_risk_uses_elevated_teams(self):
item = {"required_teams": ["engineers", "managers", "ceo"], "required_teams_high_risk": ["ceo"]}
self.assertEqual(
sop.resolve_required_teams(item, high_risk=True),
["ceo"],
)
def test_high_risk_without_elevated_falls_back_to_default(self):
# Items that don't declare required_teams_high_risk (e.g.
# comprehensive-testing, staging-smoke) are unaffected by risk-class.
item = {"required_teams": ["engineers"]}
self.assertEqual(
sop.resolve_required_teams(item, high_risk=True),
["engineers"],
)
def test_empty_elevated_list_falls_back_to_default(self):
# A defensive case: required_teams_high_risk: [] should not
# silently lock out all approvers — fall back to the default
# so the gate stays satisfiable. (Tightening should remove the
# key, not set it to empty.)
item = {"required_teams": ["engineers"], "required_teams_high_risk": []}
self.assertEqual(
sop.resolve_required_teams(item, high_risk=True),
["engineers"],
)
class TestRootCauseAckEligibilityWidened(unittest.TestCase):
"""Closes internal#442: a non-author engineers-team ack now satisfies
root-cause / no-backwards-compat for the default class.
The dead-managers/ceo-persona-token gridlock is the symptom; the
root cause is that sop-checklist ignored tier-class. These tests
pin the new wider-default behavior so it can't regress silently.
"""
def setUp(self):
self.items = _items_by_slug()
self.aliases = _numeric_aliases()
@staticmethod
def _approve_only(allowed):
return lambda slug, users: [u for u in users if u in allowed]
def test_engineers_ack_satisfies_root_cause_default_class(self):
# Bob is in engineers only (not managers, not ceo). Default class.
comments = [_comment("bob", "/sop-ack root-cause")]
# Probe: bob is approved because root-cause now lists engineers.
probe = self._approve_only({"bob"})
state = sop.compute_ack_state(
comments, "alice", self.items, self.aliases, probe, high_risk=False
)
self.assertEqual(state["root-cause"]["ackers"], ["bob"])
def test_engineers_ack_satisfies_no_backwards_compat_default_class(self):
comments = [_comment("bob", "/sop-ack no-backwards-compat")]
probe = self._approve_only({"bob"})
state = sop.compute_ack_state(
comments, "alice", self.items, self.aliases, probe, high_risk=False
)
self.assertEqual(state["no-backwards-compat"]["ackers"], ["bob"])
def test_engineers_ack_alone_fails_root_cause_when_high_risk(self):
# High-risk PR: only ceo can ack. Engineers-only ack must fail.
comments = [_comment("bob", "/sop-ack root-cause")]
# Probe: bob is in engineers, not ceo. Under high_risk,
# required_teams_high_risk=[ceo] → bob is NOT approved.
# Probe receives the items + flag indirectly via main(); for
# the unit-test path we inject a probe that rejects bob.
probe = self._approve_only(set()) # nobody is in ceo
state = sop.compute_ack_state(
comments, "alice", self.items, self.aliases, probe, high_risk=True
)
self.assertEqual(state["root-cause"]["ackers"], [])
self.assertIn("bob", state["root-cause"]["rejected"]["not_in_team"])
def test_ceo_ack_satisfies_root_cause_when_high_risk(self):
# High-risk PR + ceo-team approver → passes (the senior path).
comments = [_comment("hongming", "/sop-ack root-cause")]
probe = self._approve_only({"hongming"})
state = sop.compute_ack_state(
comments, "alice", self.items, self.aliases, probe, high_risk=True
)
self.assertEqual(state["root-cause"]["ackers"], ["hongming"])
def test_self_ack_still_forbidden_even_with_widened_eligibility(self):
# Author cannot self-ack — widening teams must NOT weaken
# the non-author rule.
comments = [_comment("alice", "/sop-ack root-cause")]
probe = self._approve_only({"alice"})
state = sop.compute_ack_state(
comments, "alice", self.items, self.aliases, probe, high_risk=False
)
self.assertEqual(state["root-cause"]["ackers"], [])
self.assertIn("alice", state["root-cause"]["rejected"]["self_ack"])
class TestHighRiskClassUsesElevatedListInConfig(unittest.TestCase):
"""End-to-end: the shipped config + RFC#450 predicate must keep
root-cause / no-backwards-compat gated on ceo for high-risk PRs."""
def test_root_cause_high_risk_elevated_to_ceo_only(self):
items = _items_by_slug()
# tier:high alone makes the PR high-risk → root-cause needs ceo.
self.assertEqual(
sop.resolve_required_teams(items["root-cause"], high_risk=True),
["ceo"],
)
# Default class accepts engineers/managers/ceo.
self.assertEqual(
sorted(sop.resolve_required_teams(items["root-cause"], high_risk=False)),
sorted(["engineers", "managers", "ceo"]),
)
def test_no_backwards_compat_high_risk_elevated_to_ceo_only(self):
items = _items_by_slug()
self.assertEqual(
sop.resolve_required_teams(items["no-backwards-compat"], high_risk=True),
["ceo"],
)
self.assertEqual(
sorted(sop.resolve_required_teams(items["no-backwards-compat"], high_risk=False)),
sorted(["engineers", "managers", "ceo"]),
)
def test_other_items_unchanged_by_risk_class(self):
# Items without required_teams_high_risk are unaffected.
items = _items_by_slug()
for slug in (
"comprehensive-testing",
"local-postgres-e2e",
"staging-smoke",
"five-axis-review",
"memory-consulted",
):
self.assertEqual(
sop.resolve_required_teams(items[slug], high_risk=False),
sop.resolve_required_teams(items[slug], high_risk=True),
f"item {slug} should not be affected by risk-class",
)
+43 -7
View File
@@ -50,6 +50,34 @@ tier_failure_mode:
"tier:low": soft
default_mode: hard # used when no tier:* label is present
# High-risk class (RFC#450 Option C, governance-fix for internal#442).
#
# A PR is "high-risk" when ANY of the listed labels are applied OR when
# the PR has `tier:high` (mechanically the strictest existing tier).
# High-risk items use `required_teams_high_risk` (when present on the
# item); non-high-risk items use the default `required_teams`.
#
# This closes the inconsistency that the SOP charter already mandates
# `tier:high → ceo only` for the sibling `sop-tier-check` gate; the
# sop-checklist's `root-cause` and `no-backwards-compat` items now
# follow the same risk-classed two-eyes shape:
# - Default class (tier:low/medium, not high-risk): a non-author
# engineers/managers/ceo ack satisfies the item — 25+ live
# identities, no dependency on a dead/inactive senior persona
# token.
# - High-risk class (tier:high OR any high_risk_label): still
# requires a non-author ceo ack (durable human team).
#
# Tightening: add labels to high_risk_labels.
# Loosening: remove labels.
high_risk_labels:
- "risk:high"
- "area:security"
- "area:schema"
- "area:fleet-image"
- "area:identity"
- "area:gate-meta"
items:
- slug: comprehensive-testing
numeric_alias: 1
@@ -78,11 +106,15 @@ items:
- slug: root-cause
numeric_alias: 4
pr_section_marker: "Root-cause not symptom"
required_teams: [managers, ceo]
required_teams: [engineers, managers, ceo]
required_teams_high_risk: [ceo]
description: >-
One-sentence root-cause statement. Ack from managers tier
(team-leads) or ceo. Senior judgment required to attest
root-cause-versus-symptom.
One-sentence root-cause statement. Default class: non-author
engineers/managers/ceo ack suffices (engineers can attest
root-cause-vs-symptom for routine fixes). High-risk class
(see `high_risk_labels`): non-author ceo ack required —
senior judgment for irreversible/security/identity/gate
changes. Closes internal#442 + tracks RFC#450.
- slug: five-axis-review
numeric_alias: 5
@@ -95,10 +127,14 @@ items:
- slug: no-backwards-compat
numeric_alias: 6
pr_section_marker: "No backwards-compat shim / dead code added"
required_teams: [managers, ceo]
required_teams: [engineers, managers, ceo]
required_teams_high_risk: [ceo]
description: >-
Yes/no + justification if no. Senior ack required because
backward-compat shims are how dead-code accretes.
Yes/no + justification if no. Default class: non-author
engineers/managers/ceo ack suffices. High-risk class
(see `high_risk_labels`): non-author ceo ack required —
senior judgment for shim-versus-real-fix on irreversible
surfaces. Closes internal#442 + tracks RFC#450.
- slug: memory-consulted
numeric_alias: 7
+177 -5
View File
@@ -52,6 +52,30 @@ name: E2E Peer Visibility (literal MCP list_peers)
# flip-to-required-ready (mirrors e2e-staging-saas.yml's proven shape;
# real EC2-provisioning E2E is push/dispatch/cron only — it is 30+ min
# and cannot run per-PR-update).
#
# LOCAL BACKEND (added 2026-05-15 — feedback_local_must_mimic_production,
# feedback_mandatory_local_e2e_before_ship, feedback_local_test_before_
# staging_e2e)
# --------------------------------------------------------------------
# The standing rule is that the local prod-mimic stack runs a MANDATORY
# local-Postgres E2E BEFORE staging E2E. A staging-only peer-visibility
# gate caught regressions late + expensively (cold EC2). The
# `peer-visibility-local` job below runs the SAME byte-identical
# assertion (tests/e2e/lib/peer_visibility_assert.sh) against the local
# docker-compose stack — built + booted exactly like e2e-api.yml's
# proven E2E API Smoke Test job (ephemeral pg/redis ports, go build,
# background platform-server). It runs on PR + push (local boot is
# minutes, not the 30+ min cold-EC2 path), so peer-visibility is part of
# the local gate that fires before the staging E2E.
#
# It is its OWN non-required status context `E2E Peer Visibility (local)`
# — same non-required-by-design decision as the staging job (red until
# Hermes-401 #162 / OpenClaw-never-online #165 land; flip-to-required
# tracked at molecule-core#1296). It is an HONEST gate: NO
# continue-on-error mask (feedback_fix_root_not_symptom). It is kept a
# distinct context (not folded into e2e-api.yml's required `E2E API
# Smoke Test`) precisely so a deliberately-RED-today gate cannot wedge
# the required local-E2E job or any unrelated merge.
on:
push:
@@ -65,6 +89,8 @@ on:
- 'workspace/a2a_mcp_server.py'
- 'workspace/platform_tools/registry.py'
- 'tests/e2e/test_peer_visibility_mcp_staging.sh'
- 'tests/e2e/test_peer_visibility_mcp_local.sh'
- 'tests/e2e/lib/peer_visibility_assert.sh'
- '.gitea/workflows/e2e-peer-visibility.yml'
pull_request:
branches: [main]
@@ -77,6 +103,8 @@ on:
- 'workspace/a2a_mcp_server.py'
- 'workspace/platform_tools/registry.py'
- 'tests/e2e/test_peer_visibility_mcp_staging.sh'
- 'tests/e2e/test_peer_visibility_mcp_local.sh'
- 'tests/e2e/lib/peer_visibility_assert.sh'
- '.gitea/workflows/e2e-peer-visibility.yml'
workflow_dispatch:
schedule:
@@ -108,16 +136,160 @@ jobs:
timeout-minutes: 5
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: Validate driving script
- name: Validate driving scripts + shared assertion lib
run: |
bash -n tests/e2e/lib/peer_visibility_assert.sh
echo "lib/peer_visibility_assert.sh — bash syntax OK"
bash -n tests/e2e/test_peer_visibility_mcp_staging.sh
echo "test_peer_visibility_mcp_staging.sh — bash syntax OK"
echo "Real fresh-provision MCP list_peers E2E runs on push to"
bash -n tests/e2e/test_peer_visibility_mcp_local.sh
echo "test_peer_visibility_mcp_local.sh — bash syntax OK"
echo "Staging fresh-provision MCP list_peers E2E runs on push to"
echo "main / workflow_dispatch / daily cron (30+ min EC2 boot)."
echo "The LOCAL backend runs in the peer-visibility-local job"
echo "below on this same PR (local docker-compose stack)."
# Real gate: provisions a throwaway org + sibling-per-runtime, drives
# the LITERAL list_peers MCP call per runtime, asserts 200 + expected
# peer set, then scoped teardown. push(main)/dispatch/cron only.
# LOCAL gate: same byte-identical assertion against the local prod-mimic
# docker-compose stack — the MANDATORY local-E2E that must run BEFORE
# the staging E2E (feedback_mandatory_local_e2e_before_ship,
# feedback_local_test_before_staging_e2e). Bootstrap mirrors
# e2e-api.yml's proven E2E API Smoke Test job (per-run container names +
# ephemeral host ports so concurrent host-network act_runner runs don't
# collide; go build; background platform-server). Its OWN non-required
# status context `E2E Peer Visibility (local)` — non-required-by-design
# exactly like the staging job (red until #162/#165 land;
# flip-to-required tracked at molecule-core#1296). HONEST gate, NO
# continue-on-error mask (feedback_fix_root_not_symptom). Runs on PR +
# push (local boot is minutes, not the 30+ min cold-EC2 path).
# bp-required: pending #1296
peer-visibility-local:
name: E2E Peer Visibility (local)
runs-on: ubuntu-latest
timeout-minutes: 30
env:
# Per-run names + ephemeral ports — same collision-avoidance as
# e2e-api.yml (host-network act_runner; feedback_act_runner_*).
PG_CONTAINER: pg-e2e-pv-${{ github.run_id }}-${{ github.run_attempt }}
REDIS_CONTAINER: redis-e2e-pv-${{ github.run_id }}-${{ github.run_attempt }}
# LLM keys so hermes/openclaw can actually boot. The local script
# SKIPs (not fails) any runtime whose key is absent, so a partially
# keyed CI env still exercises whatever it can.
CLAUDE_CODE_OAUTH_TOKEN: ${{ secrets.E2E_CLAUDE_CODE_OAUTH_TOKEN }}
E2E_MINIMAX_API_KEY: ${{ secrets.MOLECULE_STAGING_MINIMAX_API_KEY }}
E2E_ANTHROPIC_API_KEY: ${{ secrets.MOLECULE_STAGING_ANTHROPIC_API_KEY }}
E2E_OPENAI_API_KEY: ${{ secrets.MOLECULE_STAGING_OPENAI_API_KEY }}
PV_RUNTIMES: "hermes openclaw claude-code"
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- uses: actions/setup-go@40f1582b2485089dde7abd97c1529aa768e1baff # v5
with:
go-version: 'stable'
cache: true
cache-dependency-path: workspace-server/go.sum
- name: Pre-pull alpine + ensure provisioner network
run: |
docker pull alpine:latest >/dev/null
docker network create molecule-core-net >/dev/null 2>&1 || true
echo "alpine:latest pre-pulled; molecule-core-net ensured."
- name: Start Postgres (docker, ephemeral port)
run: |
docker rm -f "$PG_CONTAINER" 2>/dev/null || true
docker run -d --name "$PG_CONTAINER" \
-e POSTGRES_USER=dev -e POSTGRES_PASSWORD=dev -e POSTGRES_DB=molecule \
-p 0:5432 postgres:16 >/dev/null
PG_PORT=$(docker port "$PG_CONTAINER" 5432/tcp | awk -F: '/^0\.0\.0\.0:/ {print $2; exit}')
[ -n "$PG_PORT" ] || PG_PORT=$(docker port "$PG_CONTAINER" 5432/tcp | head -1 | awk -F: '{print $NF}')
if [ -z "$PG_PORT" ]; then
echo "::error::Could not resolve host port for $PG_CONTAINER"
docker logs "$PG_CONTAINER" || true; exit 1
fi
echo "DATABASE_URL=postgres://dev:dev@127.0.0.1:${PG_PORT}/molecule?sslmode=disable" >> "$GITHUB_ENV"
for i in $(seq 1 30); do
docker exec "$PG_CONTAINER" pg_isready -U dev >/dev/null 2>&1 && { echo "Postgres ready after ${i}s"; exit 0; }
sleep 1
done
echo "::error::Postgres did not become ready in 30s"; docker logs "$PG_CONTAINER" || true; exit 1
- name: Start Redis (docker, ephemeral port)
run: |
docker rm -f "$REDIS_CONTAINER" 2>/dev/null || true
docker run -d --name "$REDIS_CONTAINER" -p 0:6379 redis:7 >/dev/null
REDIS_PORT=$(docker port "$REDIS_CONTAINER" 6379/tcp | awk -F: '/^0\.0\.0\.0:/ {print $2; exit}')
[ -n "$REDIS_PORT" ] || REDIS_PORT=$(docker port "$REDIS_CONTAINER" 6379/tcp | head -1 | awk -F: '{print $NF}')
if [ -z "$REDIS_PORT" ]; then
echo "::error::Could not resolve host port for $REDIS_CONTAINER"
docker logs "$REDIS_CONTAINER" || true; exit 1
fi
echo "REDIS_URL=redis://127.0.0.1:${REDIS_PORT}" >> "$GITHUB_ENV"
for i in $(seq 1 15); do
docker exec "$REDIS_CONTAINER" redis-cli ping 2>/dev/null | grep -q PONG && { echo "Redis ready after ${i}s"; exit 0; }
sleep 1
done
echo "::error::Redis did not become ready in 15s"; docker logs "$REDIS_CONTAINER" || true; exit 1
- name: Build platform
working-directory: workspace-server
run: go build -o platform-server ./cmd/server
- name: Pick platform port
run: |
PLATFORM_PORT=$(python3 - <<'PY'
import socket
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.bind(("127.0.0.1", 0))
print(s.getsockname()[1])
PY
)
echo "PORT=${PLATFORM_PORT}" >> "$GITHUB_ENV"
echo "BASE=http://127.0.0.1:${PLATFORM_PORT}" >> "$GITHUB_ENV"
echo "Platform host port: ${PLATFORM_PORT}"
- name: Kill stale platform-server before start
run: |
killed=0
for pid in $(grep -l "platform-serve" /proc/[0-9]*/comm 2>/dev/null); do
kpid="${pid%/comm}"; kpid="${kpid##*/}"
cmdline=$(cat "/proc/${kpid}/cmdline" 2>/dev/null | tr '\0' ' ')
if echo "$cmdline" | grep -q "platform-server"; then
echo "Killing stale platform-server pid ${kpid}"
kill "$kpid" 2>/dev/null || true; killed=$((killed + 1))
fi
done
[ "$killed" -gt 0 ] && sleep 2 || true
echo "stale-kill done ($killed killed)"
- name: Start platform (background)
working-directory: workspace-server
run: |
./platform-server > platform.log 2>&1 &
echo $! > platform.pid
- name: Wait for /health
run: |
for i in $(seq 1 30); do
curl -sf "$BASE/health" > /dev/null && { echo "Platform up after ${i}s"; exit 0; }
sleep 1
done
echo "::error::Platform did not become healthy in 30s"
cat workspace-server/platform.log || true; exit 1
- name: Run LOCAL fresh-provision peer-visibility E2E (literal MCP list_peers)
# HONEST gate — NO continue-on-error. Red today (Hermes-401 #162 /
# OpenClaw-never-online #165 not yet fixed); green when they land.
# Non-required-by-design via its distinct status context until the
# molecule-core#1296 flip-to-required.
run: bash tests/e2e/test_peer_visibility_mcp_local.sh
- name: Dump platform log on failure
if: failure()
run: cat workspace-server/platform.log || true
- name: Stop platform
if: always()
run: |
if [ -f workspace-server/platform.pid ]; then
kill "$(cat workspace-server/platform.pid)" 2>/dev/null || true
fi
- name: Stop service containers
if: always()
run: |
docker rm -f "$PG_CONTAINER" 2>/dev/null || true
docker rm -f "$REDIS_CONTAINER" 2>/dev/null || true
# Real STAGING gate: provisions a throwaway org + sibling-per-runtime,
# drives the LITERAL list_peers MCP call per runtime, asserts 200 +
# expected peer set, then scoped teardown. push(main)/dispatch/cron only.
peer-visibility:
name: E2E Peer Visibility
runs-on: ubuntu-latest
@@ -0,0 +1,168 @@
name: Lint forbidden tenant-env keys
# RFC#523 Layer 3 (task #146): scan workspace_secrets-writer Go code
# under workspace-server/ for new code that hardcodes a forbidden
# operator-scope env var NAME (GITEA_TOKEN, CP_ADMIN_API_TOKEN,
# RAILWAY_TOKEN, INFISICAL_OPERATOR_TOKEN, MOLECULE_OPERATOR_*, …).
#
# Catches the class "a new writer accidentally widens the propagation
# set" — e.g. a future env-mutator plugin that sets envVars["GITEA_TOKEN"]
# directly. Today the L1 runtime guard would abort the provision, but
# this lint surfaces the offending code at PR review time instead of
# at first provision attempt.
#
# Companion layers:
# - L1: workspace-server/internal/handlers/workspace_provision_forbidden_env.go
# (fail-closed abort at provision time)
# - L2: workspace/entrypoint.sh top-of-file env-grep + exit 1
#
# Open-source-template-friendly: the deny pattern is generic. A fork
# can copy this workflow and replace OPERATOR_KEY_PATTERN with its
# own operator-scope key names.
#
# Path-filter discipline:
# This workflow runs on every PR (no paths: filter — see
# feedback_path_filtered_workflow_cant_be_required). The scan itself
# targets workspace_secrets-writer paths via grep -r; it's fast
# (sub-second) so unconditional run is fine.
on:
pull_request:
types: [opened, synchronize, reopened]
push:
branches: [main, staging]
env:
GITHUB_SERVER_URL: https://git.moleculesai.app
jobs:
scan:
name: Scan workspace_secrets writers for forbidden env keys
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 1
- name: Scan for forbidden operator-scope env key NAMES in writer paths
run: |
set -euo pipefail
# Forbidden EXACT-MATCH env var names. Kept in lockstep with
# workspace-server/internal/handlers/workspace_provision_forbidden_env.go
# forbiddenTenantEnvKeys. The Go-side test
# TestIsForbiddenTenantEnvKey_ExactMatches is the source of
# truth — if Go-side adds a key, also add it here (and
# vice-versa). Drift between the two is the failure mode this
# entire 3-layer guardrail is designed to catch.
FORBIDDEN_KEYS=(
"GITEA_TOKEN" "GITEA_PAT"
"GITHUB_TOKEN" "GITHUB_PAT" "GH_TOKEN"
"GITLAB_TOKEN" "GL_TOKEN"
"BITBUCKET_TOKEN"
"CP_ADMIN_API_TOKEN" "CP_ADMIN_TOKEN"
"INFISICAL_OPERATOR_TOKEN" "INFISICAL_BOOTSTRAP_TOKEN"
"RAILWAY_TOKEN" "RAILWAY_PERSONAL_API_TOKEN"
"HETZNER_TOKEN" "HETZNER_API_TOKEN"
)
# Forbidden PREFIX patterns — operator-scope families.
FORBIDDEN_PREFIXES=(
"MOLECULE_OPERATOR_"
)
# Writer paths: Go source under workspace-server/ that
# writes to the env-vars map or to workspace_secrets DB rows.
# Tests, the forbidden-env source itself, and the silent-
# strip denylist are exempt (they LIST the keys by design).
SCAN_ROOT="workspace-server/internal"
# Exempt paths fall in two classes:
# 1. The deny-set definitions + the silent-strip denylist:
# they LIST the forbidden names by design.
# 2. Pre-RFC#523 persona-merge / config-read paths that
# already handle these names correctly (the silent-
# strip downstream + the new L1 fail-closed cover the
# runtime risk; these reads are unchanged).
# New code MUST NOT be added to this list without reviewer
# signoff and a one-line justification in this diff.
EXEMPT_PATHS=(
# Class 1 — deny-set definitions
"workspace-server/internal/handlers/workspace_provision_forbidden_env.go"
"workspace-server/internal/handlers/workspace_provision_forbidden_env_test.go"
"workspace-server/internal/provisioner/provisioner.go"
"workspace-server/internal/provisioner/provisioner_test.go"
# Class 2 — pre-existing persona-fallback / org-helper paths
# that set the GITEA_TOKEN fallback lane (stripped downstream
# by provisioner.buildContainerEnv per forensic #145). The
# new L1 fail-closed runs BEFORE these writers, so any
# operator-scope leak via global/workspace_secrets is
# already caught. See applyAgentGitHTTPCreds doc-comment.
"workspace-server/internal/handlers/agent_git_identity.go"
"workspace-server/internal/handlers/org_helpers.go"
"workspace-server/internal/handlers/org.go"
# Class 2 — CP→platform admin auth (NOT a tenant env write;
# this is the control-plane HTTP auth header source).
"workspace-server/internal/provisioner/cp_provisioner.go"
)
# Build a single grep -F pattern: every forbidden key wrapped
# in quotes (Go string-literal form, which is how env-map
# writes appear). e.g. envVars["GITEA_TOKEN"] = ... or
# `"GITEA_TOKEN":` in a literal-map declaration.
#
# We deliberately match the quoted form so a comment that
# happens to spell the name without quotes (e.g. "see
# GITEA_TOKEN below") doesn't trip the lint.
PATTERN=""
for k in "${FORBIDDEN_KEYS[@]}"; do
PATTERN="${PATTERN}\"${k}\"\n"
done
for p in "${FORBIDDEN_PREFIXES[@]}"; do
# Prefix match needs a regex; switch to grep -E below for
# this slice. Kept conceptually here so the deny set lives
# in one place; scan is run twice (literal + prefix).
true
done
# Build exempt-paths grep filter — `grep -v -f` style.
EXEMPT_FILTER=$(mktemp)
trap 'rm -f "$EXEMPT_FILTER"' EXIT
for p in "${EXEMPT_PATHS[@]}"; do
echo "$p" >> "$EXEMPT_FILTER"
done
# --- Exact-match scan ---
HITS=""
for k in "${FORBIDDEN_KEYS[@]}"; do
# Only .go files; skip _test.go for the writer-path scan
# since tests legitimately reference the names. The
# writer-path lint targets PRODUCTION code only.
found=$(grep -rn --include='*.go' --exclude='*_test.go' "\"${k}\"" "$SCAN_ROOT" 2>/dev/null \
| grep -v -F -f "$EXEMPT_FILTER" || true)
if [ -n "$found" ]; then
HITS="${HITS}${found}\n"
fi
done
# --- Prefix scan ---
for prefix in "${FORBIDDEN_PREFIXES[@]}"; do
found=$(grep -rnE --include='*.go' --exclude='*_test.go' "\"${prefix}[A-Z0-9_]+\"" "$SCAN_ROOT" 2>/dev/null \
| grep -v -F -f "$EXEMPT_FILTER" || true)
if [ -n "$found" ]; then
HITS="${HITS}${found}\n"
fi
done
if [ -n "$HITS" ]; then
echo "::error::RFC#523 Layer 3: forbidden operator-scope env var name(s) hardcoded in tenant-workspace writer paths:"
printf "$HITS"
echo ""
echo "These env-var NAMES are on the operator-scope deny list (see"
echo "workspace-server/internal/handlers/workspace_provision_forbidden_env.go)."
echo "If your code legitimately needs to inject one of these for a"
echo "non-tenant code path, add the file to EXEMPT_PATHS in this"
echo "workflow with a one-line justification — reviewer signoff required."
exit 1
fi
echo "OK No forbidden operator-scope env key names hardcoded in writer paths."
@@ -0,0 +1,88 @@
name: Lint shellcheck (arm64 pilot)
# Mac-CI dual-track pilot (#233). ADDITIVE / NOT REQUIRED.
#
# Validates the arm64 self-hosted lane (no docker.sock, no privileged
# ops) before any required gate moves onto it. Until a Mac arm64 runner
# is registered with the `arm64` label, this workflow sits PENDING —
# that is FINE: `arm64` is NOT in branch_protections required contexts.
#
# Pairs with internal#543 (RFC: Mac arm64 multi-arch runner-base).
# No paths: filter on purpose (feedback_path_filtered_workflow_cant_be_required).
on:
pull_request:
branches:
- main
- staging
push:
branches:
- main
permissions:
contents: read
jobs:
shellcheck-arm64:
name: shellcheck-arm64 (pilot)
runs-on: [self-hosted, arm64]
# NOT a required check; safe to sit pending until Mac runner is up.
# If the Mac runner has trouble pulling actions/checkout we fall
# back to a plain git clone (see step 'fallback clone').
timeout-minutes: 10
env:
GITHUB_SERVER_URL: https://git.moleculesai.app
steps:
- name: Identify runner
run: |
set -eu
echo "arch=$(uname -m)"
echo "kernel=$(uname -sr)"
echo "shell=$BASH_VERSION"
# Sanity: must actually be arm64. If amd64 sneaks in here,
# fail fast — that means the label routing is wrong.
case "$(uname -m)" in
aarch64|arm64) echo "arm64 confirmed" ;;
*) echo "ERROR: expected arm64, got $(uname -m)"; exit 1 ;;
esac
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 1
- name: Install shellcheck (arm64)
run: |
set -eu
if command -v shellcheck >/dev/null 2>&1; then
echo "shellcheck already present: $(shellcheck --version | head -1)"
else
# Prefer apt if the runner base ships it; else download arm64 binary.
if command -v apt-get >/dev/null 2>&1; then
sudo apt-get update -qq
sudo apt-get install -y --no-install-recommends shellcheck
else
SC_VER=v0.10.0
curl -fsSL "https://github.com/koalaman/shellcheck/releases/download/${SC_VER}/shellcheck-${SC_VER}.linux.aarch64.tar.xz" \
| tar -xJf - --strip-components=1
sudo mv shellcheck /usr/local/bin/
fi
fi
shellcheck --version | head -2
- name: Run shellcheck on .gitea/scripts/*.sh
run: |
set -eu
# Only the scripts we control under .gitea/scripts. Pilot
# scope is intentionally narrow — broaden in a follow-up
# once the lane is proven.
mapfile -t TARGETS < <(find .gitea/scripts -maxdepth 2 -type f -name '*.sh' | sort)
if [ "${#TARGETS[@]}" -eq 0 ]; then
echo "No .sh files found under .gitea/scripts — nothing to check"
exit 0
fi
echo "Checking ${#TARGETS[@]} file(s):"
printf ' %s\n' "${TARGETS[@]}"
# SC1091 = couldn't follow non-constant source; expected for
# CI-time analysis without the full runtime layout.
shellcheck --severity=error --exclude=SC1091 "${TARGETS[@]}"
-255
View File
@@ -1,255 +0,0 @@
name: canary-verify
# Runs the canary smoke suite against the staging canary tenant fleet
# after a new :staging-<sha> image lands in ECR. On green, calls the
# CP redeploy-fleet endpoint to promote :staging-<sha> → :latest so
# the prod tenant fleet's 5-minute auto-updater picks up the verified
# digest. On red, :latest stays on the prior known-good digest and
# prod is untouched.
#
# Registry note (2026-05-10): This workflow previously used GHCR
# (ghcr.io/molecule-ai/platform-tenant) — that registry was retired
# during the 2026-05-06 Gitea suspension migration when publish-
# workspace-server-image.yml switched to the operator's ECR org
# (153263036946.dkr.ecr.us-east-2.amazonaws.com/molecule-ai/
# platform-tenant). The GHCR → ECR migration was never applied to
# this file, so canary-verify was silently smoke-testing the stale
# GHCR image while the actual staging/prod tenants ran the ECR image.
# Result: smoke tests could not catch a broken ECR build. Fix:
# - Wait step: reads SHA from running canary /health (tenant-
# agnostic, works regardless of registry).
# - Promote step: calls CP redeploy-fleet endpoint with target_tag=
# staging-<sha>, same mechanism as redeploy-tenants-on-main.yml.
# No longer attempts GHCR crane ops.
#
# Dependencies:
# - publish-workspace-server-image.yml publishes :staging-<sha>
# to ECR on staging and main merges.
# - Canary tenants are configured to pull :staging-<sha> from ECR
# (TENANT_IMAGE env set to the ECR :staging-<sha> tag).
# - Repo secrets CANARY_TENANT_URLS / CANARY_ADMIN_TOKENS /
# CANARY_CP_SHARED_SECRET are populated.
on:
workflow_run:
workflows: ["publish-workspace-server-image"]
types: [completed]
workflow_dispatch:
permissions:
contents: read
packages: write
actions: read
env:
# ECR registry (post-2026-05-06 SSOT for tenant images).
# publish-workspace-server-image.yml pushes here.
IMAGE_NAME: 153263036946.dkr.ecr.us-east-2.amazonaws.com/molecule-ai/platform
TENANT_IMAGE_NAME: 153263036946.dkr.ecr.us-east-2.amazonaws.com/molecule-ai/platform-tenant
# CP endpoint for redeploy-fleet (used in promote step below).
CP_URL: ${{ vars.CP_URL || 'https://staging-api.moleculesai.app' }}
jobs:
canary-smoke:
# Skip when the upstream workflow failed — no image to test against.
if: ${{ github.event.workflow_run.conclusion == 'success' || github.event_name == 'workflow_dispatch' }}
runs-on: ubuntu-latest
outputs:
sha: ${{ steps.compute.outputs.sha }}
smoke_ran: ${{ steps.smoke.outputs.ran }}
steps:
- name: Checkout
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: Compute sha
id: compute
run: echo "sha=${GITHUB_SHA::7}" >> "$GITHUB_OUTPUT"
- name: Wait for canary tenants to pick up :staging-<sha>
# Poll canary health endpoints every 30s for up to 7 min instead
# of a fixed 6-min sleep. Exits as soon as ALL canaries report
# the new SHA (~2-3 min typical vs 6 min fixed). Falls back to
# proceeding after 7 min even if not all canaries responded —
# the smoke suite will catch any that didn't update.
#
# NOTE: The SHA is read from the running tenant's /health response,
# NOT from a registry lookup. This is registry-agnostic and works
# regardless of whether the tenant pulls from ECR, GHCR, or any
# other registry — the canary is telling us what it's actually
# running, which is the ground truth for smoke testing.
env:
CANARY_TENANT_URLS: ${{ secrets.CANARY_TENANT_URLS }}
EXPECTED_SHA: ${{ steps.compute.outputs.sha }}
run: |
if [ -z "$CANARY_TENANT_URLS" ]; then
echo "No canary URLs configured — falling back to 60s wait"
sleep 60
exit 0
fi
IFS=',' read -ra URLS <<< "$CANARY_TENANT_URLS"
MAX_WAIT=420 # 7 minutes
INTERVAL=30
ELAPSED=0
while [ $ELAPSED -lt $MAX_WAIT ]; do
ALL_READY=true
for url in "${URLS[@]}"; do
HEALTH=$(curl -s --max-time 5 "${url}/health" 2>/dev/null || echo "{}")
SHA=$(echo "$HEALTH" | grep -o "\"sha\":\"[^\"]*\"" | head -1 | cut -d'"' -f4)
if [ "$SHA" != "$EXPECTED_SHA" ]; then
ALL_READY=false
break
fi
done
if $ALL_READY; then
echo "All canaries running staging-${EXPECTED_SHA} after ${ELAPSED}s"
exit 0
fi
echo "Waiting for canaries... (${ELAPSED}s / ${MAX_WAIT}s)"
sleep $INTERVAL
ELAPSED=$((ELAPSED + INTERVAL))
done
echo "Timeout after ${MAX_WAIT}s — proceeding anyway (smoke suite will validate)"
- name: Run canary smoke suite
id: smoke
# Graceful-skip when no canary fleet is configured (Phase 2 not yet
# stood up — see molecule-controlplane/docs/canary-tenants.md).
# Sets `ran=false` on skip so promote-to-latest stays off (we don't
# want every main merge auto-promoting without gating). Manual
# promote-latest.yml is the release gate while canary is absent.
# Once the fleet is real: delete the early-exit branch.
env:
CANARY_TENANT_URLS: ${{ secrets.CANARY_TENANT_URLS }}
CANARY_ADMIN_TOKENS: ${{ secrets.CANARY_ADMIN_TOKENS }}
CANARY_CP_BASE_URL: https://staging-api.moleculesai.app
CANARY_CP_SHARED_SECRET: ${{ secrets.CANARY_CP_SHARED_SECRET }}
run: |
set -euo pipefail
if [ -z "${CANARY_TENANT_URLS:-}" ] \
|| [ -z "${CANARY_ADMIN_TOKENS:-}" ] \
|| [ -z "${CANARY_CP_SHARED_SECRET:-}" ]; then
{
echo "## ⚠️ canary-verify skipped"
echo
echo "One or more canary secrets are unset (\`CANARY_TENANT_URLS\`, \`CANARY_ADMIN_TOKENS\`, \`CANARY_CP_SHARED_SECRET\`)."
echo "Phase 2 canary fleet has not been stood up yet —"
echo "see [canary-tenants.md](https://git.moleculesai.app/molecule-ai/molecule-controlplane/blob/main/docs/canary-tenants.md)."
echo
echo "**Skipped — promote-to-latest will NOT auto-fire.** Dispatch \`promote-latest.yml\` manually when ready."
} >> "$GITHUB_STEP_SUMMARY"
echo "ran=false" >> "$GITHUB_OUTPUT"
echo "::notice::canary-verify: skipped — no canary fleet configured"
exit 0
fi
bash scripts/canary-smoke.sh
echo "ran=true" >> "$GITHUB_OUTPUT"
- name: Summary on failure
if: ${{ failure() }}
run: |
{
echo "## Canary smoke FAILED"
echo
echo "Canary tenants rejected image \`staging-${{ steps.compute.outputs.sha }}\`."
echo ":latest stays pinned to the prior good digest — prod is untouched."
echo
echo "Fix forward and merge again, or investigate the specific failed"
echo "assertions in the canary-smoke step log above."
} >> "$GITHUB_STEP_SUMMARY"
promote-to-latest:
# On green, calls the CP redeploy-fleet endpoint with target_tag=
# staging-<sha> to promote the verified ECR image. This is the same
# mechanism as redeploy-tenants-on-main.yml — no GHCR crane ops.
#
# Pre-fix history: the old GHCR promote step used `crane tag` against
# ghcr.io/molecule-ai/platform-tenant, but publish-workspace-server-
# image.yml had already migrated to ECR on 2026-05-07 (commit
# 10e510f5). The GHCR tags were never updated, so this step was
# silently promoting a stale GHCR image while actual prod tenants
# pulled from ECR. Canary smoke tests were GHCR-targeted and could
# not catch a broken ECR build.
needs: canary-smoke
if: ${{ needs.canary-smoke.result == 'success' && needs.canary-smoke.outputs.smoke_ran == 'true' }}
runs-on: ubuntu-latest
env:
SHA: ${{ needs.canary-smoke.outputs.sha }}
CP_URL: ${{ vars.CP_URL || 'https://staging-api.moleculesai.app' }}
# CP_ADMIN_API_TOKEN gates write access to the redeploy endpoint.
# Stored at the repo level so all workflows pick it up automatically.
CP_ADMIN_API_TOKEN: ${{ secrets.CP_ADMIN_API_TOKEN }}
# canary_slug pin: deploy the verified :staging-<sha> to the canary
# first (soak 120s), then fan out to the rest of the fleet.
CANARY_SLUG: ${{ vars.CANARY_PROMOTE_SLUG || '' }}
SOAK_SECONDS: ${{ vars.CANARY_PROMOTE_SOAK || '120' }}
BATCH_SIZE: ${{ vars.CANARY_PROMOTE_BATCH || '3' }}
steps:
- name: Check CP credentials
run: |
if [ -z "${CP_ADMIN_API_TOKEN:-}" ]; then
echo "::error::CP_ADMIN_API_TOKEN secret is not set — promote step cannot call redeploy-fleet."
echo "::error::Set it at: repo Settings → Actions → Variables and Secrets → New Secret."
exit 1
fi
- name: Promote verified ECR image to :latest
run: |
set -euo pipefail
TARGET_TAG="staging-${SHA}"
BODY=$(jq -nc \
--arg tag "$TARGET_TAG" \
--argjson soak "${SOAK_SECONDS:-120}" \
--argjson batch "${BATCH_SIZE:-3}" \
--argjson dry false \
'{
target_tag: $tag,
soak_seconds: $soak,
batch_size: $batch,
dry_run: $dry
}')
if [ -n "${CANARY_SLUG:-}" ]; then
BODY=$(jq '. * {canary_slug: $slug}' --arg slug "$CANARY_SLUG" <<<"$BODY")
fi
echo "Calling: POST $CP_URL/cp/admin/tenants/redeploy-fleet"
echo " target_tag: $TARGET_TAG"
echo " body: $BODY"
HTTP_RESPONSE=$(mktemp)
HTTP_CODE_FILE=$(mktemp)
set +e
curl -sS -o "$HTTP_RESPONSE" -w '%{http_code}' \
-m 1200 \
-H "Authorization: Bearer $CP_ADMIN_API_TOKEN" \
-H "Content-Type: application/json" \
-X POST "$CP_URL/cp/admin/tenants/redeploy-fleet" \
-d "$BODY" >"$HTTP_CODE_FILE"
CURL_EXIT=$?
set -e
HTTP_CODE=$(cat "$HTTP_CODE_FILE" 2>/dev/null || echo "000")
[ -z "$HTTP_CODE" ] && HTTP_CODE="000"
echo "HTTP $HTTP_CODE (curl exit $CURL_EXIT)"
cat "$HTTP_RESPONSE" | jq . || cat "$HTTP_RESPONSE"
if [ "$HTTP_CODE" -ge 400 ]; then
echo "::error::CP redeploy-fleet returned HTTP $HTTP_CODE — refusing to proceed."
exit 1
fi
- name: Summary
run: |
{
echo "## Canary verified — :latest promoted via CP redeploy-fleet"
echo ""
echo "- **Target tag:** \`staging-${{ needs.canary-smoke.outputs.sha }}\`"
echo "- **Registry:** ECR (\`${TENANT_IMAGE_NAME}\`)"
echo "- **Canary slug:** \`${CANARY_SLUG:-<none>}\` (soak ${SOAK_SECONDS}s)"
echo "- **Batch size:** ${BATCH_SIZE:-3}"
echo ""
echo "CP redeploy-fleet is rolling out the verified image across the prod fleet."
echo "The fleet's 5-minute health-check loop will pick up the update automatically."
} >> "$GITHUB_STEP_SUMMARY"
@@ -1,400 +0,0 @@
name: redeploy-tenants-on-main
# Auto-refresh prod tenant EC2s after every main merge.
#
# Why this workflow exists: publish-workspace-server-image builds and
# pushes a new platform-tenant :<sha> to ECR on every merge to main,
# but running tenants pulled their image once at boot and never re-pull.
# Users see stale code indefinitely.
#
# This workflow closes the gap by calling the control-plane admin
# endpoint that performs a canary-first, batched, health-gated rolling
# redeploy across every live tenant. Implemented in molecule-ai/
# molecule-controlplane as POST /cp/admin/tenants/redeploy-fleet
# (feat/tenant-auto-redeploy, landing alongside this workflow).
#
# Registry: ECR (153263036946.dkr.ecr.us-east-2.amazonaws.com/
# molecule-ai/platform-tenant). GHCR was retired 2026-05-07 during the
# Gitea suspension migration. The canary-verify.yml promote step now
# uses the same redeploy-fleet endpoint (fixes the silent-GHCR gap).
#
# Runtime ordering:
# 1. publish-workspace-server-image completes → new :staging-<sha> in ECR.
# 2. This workflow fires via workflow_run, calls redeploy-fleet with
# target_tag=staging-<sha>. No CDN propagation wait needed —
# ECR image manifest is consistent immediately after push.
# 3. Calls redeploy-fleet with canary_slug (if set) and a soak
# period. Canary proves the image boots; batches follow.
# 4. Any failure aborts the rollout and leaves older tenants on the
# prior image — safer default than half-and-half state.
#
# Rollback path: re-run this workflow with a specific SHA pinned via
# the workflow_dispatch input. That calls redeploy-fleet with
# target_tag=<sha>, re-pulling the older image on every tenant.
on:
workflow_run:
workflows: ['publish-workspace-server-image']
types: [completed]
branches: [main]
workflow_dispatch:
inputs:
target_tag:
# Empty default → auto-trigger and dispatch-without-input both
# resolve to `staging-<short_head_sha>` (the digest publish-image
# just pushed). Pre-fix this defaulted to 'latest', which only
# gets retagged by canary-verify's promote-to-latest job — and
# that job soft-skips when CANARY_TENANT_URLS is unset (the
# current state, until Phase 2 canary fleet is live). Result:
# `:latest` had been pinned to a 4-day-old digest (2026-04-28)
# while every main push pushed fresh `staging-<sha>` images;
# every prod redeploy pulled the stale `:latest` and the verify
# step correctly flagged 3/3 tenants STALE. Pulling the
# just-published `staging-<sha>` directly skips the dead retag
# path. When canary fleet is real, this workflow should chain
# on canary-verify completion (workflow_run from canary-verify),
# not publish-image — separate, smaller PR.
description: 'Tenant image tag to deploy (e.g. "latest", "staging-a59f1a6c"). Empty = auto staging-<head_sha>.'
required: false
type: string
default: ''
canary_slug:
description: 'Tenant slug to deploy first + soak (empty = skip canary, fan out immediately).'
required: false
type: string
# Must be an actual prod tenant slug (current: hongming,
# chloe-dong, reno-stars). The previous default 'hongmingwang'
# didn't match any tenant — CP soft-skipped the missing canary
# and the fleet rolled out without the soak gate, defeating the
# whole point of canary-first.
default: 'hongming'
soak_seconds:
description: 'Seconds to wait after canary before fanning out.'
required: false
type: string
default: '60'
batch_size:
description: 'How many tenants SSM redeploys in parallel per batch.'
required: false
type: string
default: '3'
dry_run:
description: 'Plan only — do not actually redeploy.'
required: false
type: boolean
default: false
permissions:
contents: read
# No write scopes needed — the workflow hits an external CP endpoint,
# not the GitHub API.
# Serialize redeploys so two rapid main pushes' redeploys don't overlap
# and cause confusing per-tenant SSM state. Without this, GitHub's
# implicit workflow_run queueing would *probably* serialize them, but
# the explicit block makes the invariant defensible. Mirrors the
# concurrency block on redeploy-tenants-on-staging.yml for shape parity.
#
# cancel-in-progress: false → aborting a half-rolled-out fleet would
# leave tenants stuck on whatever image they happened to be on when
# cancelled. Better to finish the in-flight rollout before starting
# the next one.
concurrency:
group: redeploy-tenants-on-main
cancel-in-progress: false
jobs:
redeploy:
# Skip the auto-trigger if publish-workspace-server-image didn't
# actually succeed. workflow_run fires on any completion state; we
# don't want to redeploy against a half-built image.
if: |
github.event_name == 'workflow_dispatch' ||
(github.event_name == 'workflow_run' && github.event.workflow_run.conclusion == 'success')
runs-on: ubuntu-latest
timeout-minutes: 25
steps:
- name: Note on ECR propagation
# ECR image manifests are consistent immediately after push — no
# CDN cache to wait for. The old GHCR-based workflow had a 30s
# sleep to avoid race conditions; ECR makes that unnecessary.
run: echo "ECR image available immediately after push — proceeding."
- name: Compute target tag
id: tag
# Resolution order:
# 1. Operator-supplied input (workflow_dispatch with explicit
# tag) → used verbatim. Lets ops pin `latest` for emergency
# rollback to last canary-verified digest, or pin a specific
# `staging-<sha>` to roll back to a known-good build.
# 2. Default → `staging-<short_head_sha>`. The just-published
# digest. Bypasses the `:latest` retag path that's currently
# dead (canary-verify soft-skips without canary fleet, so
# the only thing retagging `:latest` today is the manual
# promote-latest.yml — last run 2026-04-28). Auto-trigger
# from workflow_run uses workflow_run.head_sha; manual
# dispatch with no input falls through to github.sha.
env:
INPUT_TAG: ${{ inputs.target_tag }}
HEAD_SHA: ${{ github.event.workflow_run.head_sha || github.sha }}
run: |
set -euo pipefail
if [ -n "${INPUT_TAG:-}" ]; then
echo "target_tag=$INPUT_TAG" >> "$GITHUB_OUTPUT"
echo "Using operator-pinned tag: $INPUT_TAG"
else
SHORT="${HEAD_SHA:0:7}"
echo "target_tag=staging-$SHORT" >> "$GITHUB_OUTPUT"
echo "Using auto tag: staging-$SHORT (head_sha=$HEAD_SHA)"
fi
- name: Call CP redeploy-fleet
# CP_ADMIN_API_TOKEN must be set as a repo/org secret on
# molecule-ai/molecule-core, matching the staging/prod CP's
# CP_ADMIN_API_TOKEN env. Stored in Railway, mirrored to this
# repo's secrets for CI.
env:
CP_URL: ${{ vars.CP_URL || 'https://api.moleculesai.app' }}
CP_ADMIN_API_TOKEN: ${{ secrets.CP_ADMIN_API_TOKEN }}
TARGET_TAG: ${{ steps.tag.outputs.target_tag }}
CANARY_SLUG: ${{ inputs.canary_slug || 'hongming' }}
SOAK_SECONDS: ${{ inputs.soak_seconds || '60' }}
BATCH_SIZE: ${{ inputs.batch_size || '3' }}
DRY_RUN: ${{ inputs.dry_run || false }}
run: |
set -euo pipefail
if [ -z "${CP_ADMIN_API_TOKEN:-}" ]; then
echo "::error::CP_ADMIN_API_TOKEN secret not set — skipping redeploy"
echo "::notice::Set CP_ADMIN_API_TOKEN in repo secrets to enable auto-redeploy."
exit 1
fi
BODY=$(jq -nc \
--arg tag "$TARGET_TAG" \
--arg canary "$CANARY_SLUG" \
--argjson soak "$SOAK_SECONDS" \
--argjson batch "$BATCH_SIZE" \
--argjson dry "$DRY_RUN" \
'{
target_tag: $tag,
canary_slug: $canary,
soak_seconds: $soak,
batch_size: $batch,
dry_run: $dry
}')
echo "POST $CP_URL/cp/admin/tenants/redeploy-fleet"
echo " body: $BODY"
HTTP_RESPONSE=$(mktemp)
HTTP_CODE_FILE=$(mktemp)
# Route -w into its own tempfile so curl's exit code (e.g. 56
# on connection-reset, 22 on --fail-with-body 4xx/5xx) can't
# pollute the captured stdout. The previous inline-substitution
# shape produced "000000" on connection reset (curl wrote
# "000" via -w, then the inline echo-fallback appended another
# "000") — caught on the 2026-05-04 redeploy of sha 2b862f6.
# set +e/-e keeps the non-zero curl exit from tripping the
# outer pipeline. See lint-curl-status-capture.yml for the
# CI gate that pins this fix shape.
set +e
curl -sS -o "$HTTP_RESPONSE" -w '%{http_code}' \
-m 1200 \
-H "Authorization: Bearer $CP_ADMIN_API_TOKEN" \
-H "Content-Type: application/json" \
-X POST "$CP_URL/cp/admin/tenants/redeploy-fleet" \
-d "$BODY" >"$HTTP_CODE_FILE"
set -e
# Stderr from curl (e.g. dial errors with -sS) goes to the runner
# log so operators can see WHY a connection failed. Stdout is
# captured to $HTTP_CODE_FILE because that's where -w writes.
HTTP_CODE=$(cat "$HTTP_CODE_FILE" 2>/dev/null || echo "000")
[ -z "$HTTP_CODE" ] && HTTP_CODE="000"
echo "HTTP $HTTP_CODE"
cat "$HTTP_RESPONSE" | jq . || cat "$HTTP_RESPONSE"
# Pretty-print per-tenant results in the job summary so
# ops can see which tenants were redeployed without drilling
# into the raw response.
{
echo "## Tenant redeploy fleet"
echo ""
echo "**Target tag:** \`$TARGET_TAG\`"
echo "**Canary:** \`$CANARY_SLUG\` (soak ${SOAK_SECONDS}s)"
echo "**Batch size:** $BATCH_SIZE"
echo "**Dry run:** $DRY_RUN"
echo "**HTTP:** $HTTP_CODE"
echo ""
echo "### Per-tenant result"
echo ""
echo '| Slug | Phase | SSM Status | Exit | Healthz | Error |'
echo '|------|-------|------------|------|---------|-------|'
jq -r '.results[]? | "| \(.slug) | \(.phase) | \(.ssm_status // "-") | \(.ssm_exit_code) | \(.healthz_ok) | \(.error // "-") |"' "$HTTP_RESPONSE" || true
} >> "$GITHUB_STEP_SUMMARY"
if [ "$HTTP_CODE" != "200" ]; then
echo "::error::redeploy-fleet returned HTTP $HTTP_CODE"
exit 1
fi
OK=$(jq -r '.ok' "$HTTP_RESPONSE")
if [ "$OK" != "true" ]; then
echo "::error::redeploy-fleet reported ok=false (see summary for which tenant halted the rollout)"
exit 1
fi
echo "::notice::Tenant fleet redeploy reported ssm_status=Success — verifying actual image roll on each tenant..."
# Stash the response for the verify step. $RUNNER_TEMP outlasts
# the step boundary; $HTTP_RESPONSE doesn't.
cp "$HTTP_RESPONSE" "$RUNNER_TEMP/redeploy-response.json"
- name: Verify each tenant /buildinfo matches published SHA
# ROOT FIX FOR #2395.
#
# `redeploy-fleet`'s `ssm_status=Success` means "the SSM RPC
# didn't error" — NOT "the new image is running on the tenant."
# `:latest` lives in the local Docker daemon's image cache; if
# the SSM document does `docker compose up -d` without an
# explicit `docker pull`, the daemon serves the previously-
# cached digest and the container restarts on stale code.
# 2026-04-30 incident: hongmingwang's tenant reported
# ssm_status=Success at 17:00:53Z but kept serving pre-501a42d7
# chat_files for 30+ min — the lazy-heal fix never reached the
# user despite green deploy + green redeploy.
#
# This step closes the gap by curling each tenant's /buildinfo
# endpoint (added in workspace-server/internal/buildinfo +
# /Dockerfile* GIT_SHA build-arg, this PR) and comparing the
# returned git_sha to the SHA the workflow expects. Mismatches
# fail the workflow, which is what `ok=true` should have
# guaranteed all along.
#
# When the redeploy was triggered by workflow_dispatch with a
# specific tag (target_tag != "latest"), the expected SHA may
# not equal ${{ github.sha }} — in that case we resolve via
# GHCR's manifest. For workflow_run (default :latest) the
# workflow_run.head_sha is the SHA that just published.
env:
EXPECTED_SHA: ${{ github.event.workflow_run.head_sha || github.sha }}
TARGET_TAG: ${{ steps.tag.outputs.target_tag }}
# Tenant subdomain template — slugs from the response are
# appended. Production CP issues `<slug>.moleculesai.app`;
# staging CP issues `<slug>.staging.moleculesai.app`. This
# workflow runs on main → prod CP → no `staging.` infix.
TENANT_DOMAIN: 'moleculesai.app'
run: |
set -euo pipefail
EXPECTED_SHORT="${EXPECTED_SHA:0:7}"
if [ "$TARGET_TAG" != "latest" ] \
&& [ "$TARGET_TAG" != "$EXPECTED_SHA" ] \
&& [ "$TARGET_TAG" != "staging-$EXPECTED_SHORT" ]; then
# workflow_dispatch with a pinned tag that isn't the head
# SHA — operator is rolling back / pinning. Skip the
# verification because we don't have the expected SHA in
# this context (would need to crane-inspect the GHCR
# manifest, which is a follow-up). Failing-open here is
# safe: the operator chose the tag deliberately.
#
# `staging-<short_head_sha>` IS verified — it's the new
# auto-trigger default (see Compute target tag step) and
# the digest under that tag SHOULD match EXPECTED_SHA.
echo "::notice::target_tag=$TARGET_TAG (operator-pinned) — skipping per-tenant SHA verification."
exit 0
fi
RESP="$RUNNER_TEMP/redeploy-response.json"
if [ ! -s "$RESP" ]; then
echo "::error::redeploy-response.json missing or empty — verify step ran without a response to read"
exit 1
fi
# Pull only successfully-redeployed tenants. Any tenant that
# halted the rollout already failed the previous step, so we
# don't double-count them here.
mapfile -t SLUGS < <(jq -r '.results[]? | select(.healthz_ok == true) | .slug' "$RESP")
if [ ${#SLUGS[@]} -eq 0 ]; then
echo "::warning::No tenants reported healthz_ok — nothing to verify"
exit 0
fi
echo "Verifying ${#SLUGS[@]} tenant(s) against EXPECTED_SHA=${EXPECTED_SHA:0:7}..."
# Two distinct failure modes — STALE (the #2395 bug class, hard-fail)
# vs UNREACHABLE (teardown race, soft-warn). See the staging variant's
# comment for the full rationale; same logic applies on prod even
# though prod has fewer ephemeral tenants — the asymmetry would be a
# gratuitous fork.
STALE_COUNT=0
UNREACHABLE_COUNT=0
STALE_LINES=()
UNREACHABLE_LINES=()
for slug in "${SLUGS[@]}"; do
URL="https://${slug}.${TENANT_DOMAIN}/buildinfo"
# 30s total: tenant just SSM-restarted, may still be coming
# up. Retry-on-empty rather than retry-on-status — we want
# to fail fast on "responded with wrong SHA", not "still
# warming up".
BODY=$(curl -sS --max-time 30 --retry 3 --retry-delay 5 --retry-connrefused "$URL" || true)
ACTUAL_SHA=$(echo "$BODY" | jq -r '.git_sha // ""' 2>/dev/null || echo "")
if [ -z "$ACTUAL_SHA" ]; then
UNREACHABLE_COUNT=$((UNREACHABLE_COUNT + 1))
UNREACHABLE_LINES+=("| $slug | (no /buildinfo response) | ${EXPECTED_SHA:0:7} | ⚠ unreachable (likely teardown race) |")
continue
fi
if [ "$ACTUAL_SHA" = "$EXPECTED_SHA" ]; then
echo " $slug: ${ACTUAL_SHA:0:7} ✓"
else
STALE_COUNT=$((STALE_COUNT + 1))
STALE_LINES+=("| $slug | ${ACTUAL_SHA:0:7} | ${EXPECTED_SHA:0:7} | ❌ stale |")
fi
done
{
echo ""
echo "### Per-tenant /buildinfo verification"
echo ""
echo "Expected SHA: \`${EXPECTED_SHA:0:7}\`"
echo ""
if [ $STALE_COUNT -gt 0 ]; then
echo "**${STALE_COUNT} STALE tenant(s) — these did NOT pick up the new image despite ssm_status=Success:**"
echo ""
echo "| Slug | Actual /buildinfo SHA | Expected | Status |"
echo "|------|----------------------|----------|--------|"
for line in "${STALE_LINES[@]}"; do echo "$line"; done
echo ""
fi
if [ $UNREACHABLE_COUNT -gt 0 ]; then
echo "**${UNREACHABLE_COUNT} unreachable tenant(s) — likely teardown race (soft-warn, not failing):**"
echo ""
echo "| Slug | Actual /buildinfo SHA | Expected | Status |"
echo "|------|----------------------|----------|--------|"
for line in "${UNREACHABLE_LINES[@]}"; do echo "$line"; done
echo ""
fi
if [ $STALE_COUNT -eq 0 ] && [ $UNREACHABLE_COUNT -eq 0 ]; then
echo "All ${#SLUGS[@]} tenants returned matching SHA. ✓"
fi
} >> "$GITHUB_STEP_SUMMARY"
if [ $UNREACHABLE_COUNT -gt 0 ]; then
echo "::warning::$UNREACHABLE_COUNT tenant(s) unreachable post-redeploy. Likely benign teardown race — CP healthz monitor catches real outages."
fi
# Belt-and-suspenders sanity floor: same logic as the staging
# variant — see that file's comment for the full rationale.
# Floor only applies when fleet >= 4; below that, canary-verify
# is the actual gate.
TOTAL_VERIFIED=${#SLUGS[@]}
if [ $TOTAL_VERIFIED -ge 4 ] && [ $UNREACHABLE_COUNT -gt $((TOTAL_VERIFIED / 2)) ]; then
echo "::error::$UNREACHABLE_COUNT of $TOTAL_VERIFIED tenant(s) unreachable — exceeds 50% threshold on a fleet large enough that this signals a real outage, not teardown race."
exit 1
fi
if [ $STALE_COUNT -gt 0 ]; then
echo "::error::$STALE_COUNT tenant(s) returned a stale SHA. ssm_status=Success was misleading — see job summary."
exit 1
fi
echo "::notice::Tenant fleet redeploy complete — all reachable tenants on ${EXPECTED_SHA:0:7} (${UNREACHABLE_COUNT} unreachable, soft-warned)."
@@ -1,362 +0,0 @@
name: redeploy-tenants-on-staging
# Auto-refresh staging tenant EC2s after every staging-branch merge.
#
# Mirror of redeploy-tenants-on-main.yml, with the staging-CP host and
# the :staging-latest tag. Sister workflow exists for prod (rolls
# :latest after canary-verify). Both share the same shape — just
# different CP_URL + target_tag + admin token secret.
#
# Why this workflow exists: publish-workspace-server-image now builds
# on every staging-branch push (PR #2335), pushing
# platform-tenant:staging-latest to GHCR. Existing tenants pulled
# their image once at boot and never re-pull, so the new image just
# sits unused until the tenant is reprovisioned.
#
# This workflow closes the gap by calling staging-CP's
# /cp/admin/tenants/redeploy-fleet, which performs a canary-first,
# batched, health-gated SSM redeploy across every live staging tenant.
# Same endpoint shape as prod CP — only the host differs.
#
# Runtime ordering:
# 1. publish-workspace-server-image completes on staging branch →
# new :staging-latest in GHCR.
# 2. This workflow fires via workflow_run, waits 30s for GHCR's CDN
# to propagate the new tag.
# 3. Calls redeploy-fleet with no canary (staging IS canary; we don't
# need a sub-canary inside it). Soak still applies to the first
# tenant in case of bad-deploy detection.
# 4. Any failure aborts the rollout and leaves older tenants on the
# prior image — safer default than half-and-half state.
#
# Rollback path: re-run with workflow_dispatch + target_tag=staging-<sha>
# of a known-good build.
on:
workflow_run:
workflows: ['publish-workspace-server-image']
types: [completed]
branches: [main]
workflow_dispatch:
inputs:
target_tag:
description: 'Tenant image tag to deploy (e.g. "staging-latest" or "staging-a59f1a6c"). Defaults to staging-latest when empty.'
required: false
type: string
default: 'staging-latest'
canary_slug:
description: 'Tenant slug to deploy first + soak (empty = skip canary, fan out immediately). Default empty for staging since staging itself is the canary.'
required: false
type: string
default: ''
soak_seconds:
description: 'Seconds to wait after canary before fanning out. Only meaningful if canary_slug is set.'
required: false
type: string
default: '60'
batch_size:
description: 'How many tenants SSM redeploys in parallel per batch.'
required: false
type: string
default: '3'
dry_run:
description: 'Plan only — do not actually redeploy.'
required: false
type: boolean
default: false
permissions:
contents: read
# No write scopes needed — the workflow hits an external CP endpoint,
# not the GitHub API.
# Serialize per-branch so two rapid staging pushes' redeploys don't
# overlap and cause confusing per-tenant SSM state. cancel-in-progress
# is false because aborting a half-rolled-out fleet leaves tenants
# stuck on whatever image they happened to be on when cancelled.
concurrency:
group: redeploy-tenants-on-staging
cancel-in-progress: false
jobs:
redeploy:
# Skip the auto-trigger if publish-workspace-server-image didn't
# actually succeed. workflow_run fires on any completion state; we
# don't want to redeploy against a half-built image.
if: |
github.event_name == 'workflow_dispatch' ||
(github.event_name == 'workflow_run' && github.event.workflow_run.conclusion == 'success')
runs-on: ubuntu-latest
timeout-minutes: 25
steps:
- name: Wait for GHCR tag propagation
# GHCR's edge cache takes ~15-30s to consistently serve the new
# :staging-latest manifest after the registry accepts the push.
# Same rationale as redeploy-tenants-on-main.yml.
run: sleep 30
- name: Call staging-CP redeploy-fleet
# CP_STAGING_ADMIN_API_TOKEN must be set as a repo/org secret
# on molecule-ai/molecule-core, matching staging-CP's
# CP_ADMIN_API_TOKEN env var (visible in Railway controlplane
# / staging environment). Stored separately from the prod
# CP_ADMIN_API_TOKEN so a leak of one doesn't auth the other.
env:
CP_URL: ${{ vars.STAGING_CP_URL || 'https://staging-api.moleculesai.app' }}
CP_STAGING_ADMIN_API_TOKEN: ${{ secrets.CP_STAGING_ADMIN_API_TOKEN }}
TARGET_TAG: ${{ inputs.target_tag || 'staging-latest' }}
CANARY_SLUG: ${{ inputs.canary_slug || '' }}
SOAK_SECONDS: ${{ inputs.soak_seconds || '60' }}
BATCH_SIZE: ${{ inputs.batch_size || '3' }}
DRY_RUN: ${{ inputs.dry_run || false }}
run: |
set -euo pipefail
# Schedule-vs-dispatch hardening (mirrors sweep-cf-orphans
# and sweep-cf-tunnels): hard-fail on auto-trigger when the
# secret is missing so a misconfigured-repo doesn't silently
# serve stale staging tenants. Soft-skip on operator dispatch.
if [ -z "${CP_STAGING_ADMIN_API_TOKEN:-}" ]; then
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
echo "::warning::CP_STAGING_ADMIN_API_TOKEN secret not set — skipping redeploy"
echo "::warning::Set CP_STAGING_ADMIN_API_TOKEN in repo secrets to enable auto-redeploy."
echo "::notice::Pull the value from staging-CP's CP_ADMIN_API_TOKEN env in Railway."
exit 0
fi
echo "::error::staging redeploy cannot run — CP_STAGING_ADMIN_API_TOKEN secret missing"
echo "::error::set it at Settings → Secrets and Variables → Actions; pull from staging-CP's CP_ADMIN_API_TOKEN env in Railway."
exit 1
fi
BODY=$(jq -nc \
--arg tag "$TARGET_TAG" \
--arg canary "$CANARY_SLUG" \
--argjson soak "$SOAK_SECONDS" \
--argjson batch "$BATCH_SIZE" \
--argjson dry "$DRY_RUN" \
'{
target_tag: $tag,
canary_slug: $canary,
soak_seconds: $soak,
batch_size: $batch,
dry_run: $dry
}')
echo "POST $CP_URL/cp/admin/tenants/redeploy-fleet"
echo " body: $BODY"
HTTP_RESPONSE=$(mktemp)
HTTP_CODE_FILE=$(mktemp)
# Route -w into its own tempfile so curl's exit code (e.g. 56
# on connection-reset) can't pollute the captured stdout. The
# previous inline-substitution shape produced "000000" on
# connection reset — caught on main variant 2026-05-04
# redeploying sha 2b862f6. Same fix shape as the synth-E2E
# §9c gate (PR #2797). See lint-curl-status-capture.yml for
# the CI gate that pins this fix shape.
set +e
curl -sS -o "$HTTP_RESPONSE" -w '%{http_code}' \
-m 1200 \
-H "Authorization: Bearer $CP_STAGING_ADMIN_API_TOKEN" \
-H "Content-Type: application/json" \
-X POST "$CP_URL/cp/admin/tenants/redeploy-fleet" \
-d "$BODY" >"$HTTP_CODE_FILE"
set -e
# Stderr from curl (-sS shows dial errors etc.) goes to the
# runner log so operators can see WHY a connection failed.
HTTP_CODE=$(cat "$HTTP_CODE_FILE" 2>/dev/null || echo "000")
[ -z "$HTTP_CODE" ] && HTTP_CODE="000"
echo "HTTP $HTTP_CODE"
cat "$HTTP_RESPONSE" | jq . || cat "$HTTP_RESPONSE"
{
echo "## Staging tenant redeploy fleet"
echo ""
echo "**Target tag:** \`$TARGET_TAG\`"
echo "**Canary:** \`${CANARY_SLUG:-(none — staging is itself the canary)}\` (soak ${SOAK_SECONDS}s)"
echo "**Batch size:** $BATCH_SIZE"
echo "**Dry run:** $DRY_RUN"
echo "**HTTP:** $HTTP_CODE"
echo ""
echo "### Per-tenant result"
echo ""
echo '| Slug | Phase | SSM Status | Exit | Healthz | Error |'
echo '|------|-------|------------|------|---------|-------|'
jq -r '.results[]? | "| \(.slug) | \(.phase) | \(.ssm_status // "-") | \(.ssm_exit_code) | \(.healthz_ok) | \(.error // "-") |"' "$HTTP_RESPONSE" || true
} >> "$GITHUB_STEP_SUMMARY"
# Distinguish "real fleet failure" from "E2E teardown race".
#
# CP returns HTTP 500 + ok=false whenever ANY tenant in the
# fleet failed SSM or healthz. In practice the recurring source
# of these is ephemeral test tenants being torn down by their
# parent E2E run mid-redeploy: the EC2 dies → SSM exit=2 or
# healthz timeout → CP marks the fleet failed → this workflow
# goes red even though every operator-facing tenant rolled fine.
#
# Ephemeral slug prefixes (kept in sync with sweep-stale-e2e-orgs.yml
# — see that file for the source-of-truth list and rationale):
# - e2e-* — canvas/saas/ext E2E suites
# - rt-e2e-* — runtime-test harness fixtures (RFC #2251)
# Long-lived prefixes that are NOT ephemeral and MUST hard-fail:
# demo-prep, dryrun-*, dryrun2-*, plus all human tenant slugs.
#
# Filter: if HTTP=500/ok=false AND every failed slug matches an
# ephemeral prefix, treat as soft-warn and let the verify step
# downstream handle unreachable-vs-stale (#2402). Any non-ephemeral
# failure or a non-500 HTTP response remains a hard failure.
OK=$(jq -r '.ok // "false"' "$HTTP_RESPONSE")
FAILED_SLUGS=$(jq -r '
.results[]?
| select((.healthz_ok != true) or (.ssm_status != "Success"))
| .slug' "$HTTP_RESPONSE" 2>/dev/null || true)
EPHEMERAL_PREFIX_RE='^(e2e-|rt-e2e-)'
NON_EPHEMERAL_FAILED=$(printf '%s\n' "$FAILED_SLUGS" | grep -v '^$' | grep -Ev "$EPHEMERAL_PREFIX_RE" || true)
if [ "$HTTP_CODE" = "200" ] && [ "$OK" = "true" ]; then
: # happy path — fall through to verification
elif [ "$HTTP_CODE" = "500" ] && [ -z "$NON_EPHEMERAL_FAILED" ] && [ -n "$FAILED_SLUGS" ]; then
COUNT=$(printf '%s\n' "$FAILED_SLUGS" | grep -Ec "$EPHEMERAL_PREFIX_RE" || true)
echo "::warning::redeploy-fleet returned HTTP 500 but every failed tenant ($COUNT) is ephemeral (e2e-*/rt-e2e-*) — treating as teardown race, soft-warning."
printf '%s\n' "$FAILED_SLUGS" | sed 's/^/::warning:: failed: /'
elif [ "$HTTP_CODE" != "200" ]; then
echo "::error::redeploy-fleet returned HTTP $HTTP_CODE"
if [ -n "$NON_EPHEMERAL_FAILED" ]; then
echo "::error::non-ephemeral tenant(s) failed:"
printf '%s\n' "$NON_EPHEMERAL_FAILED" | sed 's/^/::error:: /'
fi
exit 1
else
# HTTP=200 but ok=false (shouldn't happen with current CP
# but keep the gate for completeness).
echo "::error::redeploy-fleet reported ok=false (see summary for which tenant halted the rollout)"
exit 1
fi
echo "::notice::Staging tenant fleet redeploy reported ssm_status=Success — verifying actual image roll on each tenant..."
cp "$HTTP_RESPONSE" "$RUNNER_TEMP/redeploy-response.json"
- name: Verify each staging tenant /buildinfo matches published SHA
# Mirror of the verify step in redeploy-tenants-on-main.yml — see
# there for the rationale (#2395 root fix). Staging has the same
# ssm_status-success-but-stale-image hazard and benefits from the
# same gate. Diff: TENANT_DOMAIN includes the `staging.` infix.
env:
EXPECTED_SHA: ${{ github.event.workflow_run.head_sha || github.sha }}
TARGET_TAG: ${{ inputs.target_tag || 'staging-latest' }}
TENANT_DOMAIN: 'staging.moleculesai.app'
run: |
set -euo pipefail
# staging-latest is the staging-side moving tag; treat it the
# same way main treats `latest`. Operator-pinned SHAs skip
# verification (see main variant for why).
if [ "$TARGET_TAG" != "staging-latest" ] && [ "$TARGET_TAG" != "latest" ] && [ "$TARGET_TAG" != "$EXPECTED_SHA" ]; then
echo "::notice::target_tag=$TARGET_TAG (operator-pinned) — skipping per-tenant SHA verification."
exit 0
fi
RESP="$RUNNER_TEMP/redeploy-response.json"
if [ ! -s "$RESP" ]; then
echo "::error::redeploy-response.json missing or empty"
exit 1
fi
mapfile -t SLUGS < <(jq -r '.results[]? | select(.healthz_ok == true) | .slug' "$RESP")
if [ ${#SLUGS[@]} -eq 0 ]; then
echo "::warning::No staging tenants reported healthz_ok — nothing to verify"
exit 0
fi
echo "Verifying ${#SLUGS[@]} staging tenant(s) against EXPECTED_SHA=${EXPECTED_SHA:0:7}..."
# Two distinct failure modes here:
# STALE_COUNT — tenant returned a SHA that doesn't match. THIS is
# the #2395 bug class: tenant up + serving old code.
# Always hard-fail the workflow.
# UNREACHABLE_COUNT — tenant didn't respond. Almost always a benign
# teardown race: redeploy-fleet snapshot says
# healthz_ok=true, then the E2E suite tears the
# ephemeral tenant down before this step runs (the
# e2e-* fixtures churn 5-10/hour on staging). Soft-
# warn so we don't block staging→main on cleanup.
# Real "tenant up but unreachable" is caught by CP's
# own healthz monitor + the post-redeploy alert; we
# don't need to double-count it here.
STALE_COUNT=0
UNREACHABLE_COUNT=0
STALE_LINES=()
UNREACHABLE_LINES=()
for slug in "${SLUGS[@]}"; do
URL="https://${slug}.${TENANT_DOMAIN}/buildinfo"
BODY=$(curl -sS --max-time 30 --retry 3 --retry-delay 5 --retry-connrefused "$URL" || true)
ACTUAL_SHA=$(echo "$BODY" | jq -r '.git_sha // ""' 2>/dev/null || echo "")
if [ -z "$ACTUAL_SHA" ]; then
UNREACHABLE_COUNT=$((UNREACHABLE_COUNT + 1))
UNREACHABLE_LINES+=("| $slug | (no /buildinfo response) | ${EXPECTED_SHA:0:7} | ⚠ unreachable (likely teardown race) |")
continue
fi
if [ "$ACTUAL_SHA" = "$EXPECTED_SHA" ]; then
echo " $slug: ${ACTUAL_SHA:0:7} ✓"
else
STALE_COUNT=$((STALE_COUNT + 1))
STALE_LINES+=("| $slug | ${ACTUAL_SHA:0:7} | ${EXPECTED_SHA:0:7} | ❌ stale |")
fi
done
{
echo ""
echo "### Per-tenant /buildinfo verification (staging)"
echo ""
echo "Expected SHA: \`${EXPECTED_SHA:0:7}\`"
echo ""
if [ $STALE_COUNT -gt 0 ]; then
echo "**${STALE_COUNT} STALE tenant(s) — these did NOT pick up the new image despite ssm_status=Success:**"
echo ""
echo "| Slug | Actual /buildinfo SHA | Expected | Status |"
echo "|------|----------------------|----------|--------|"
for line in "${STALE_LINES[@]}"; do echo "$line"; done
echo ""
fi
if [ $UNREACHABLE_COUNT -gt 0 ]; then
echo "**${UNREACHABLE_COUNT} unreachable tenant(s) — likely E2E teardown race (soft-warn, not failing):**"
echo ""
echo "| Slug | Actual /buildinfo SHA | Expected | Status |"
echo "|------|----------------------|----------|--------|"
for line in "${UNREACHABLE_LINES[@]}"; do echo "$line"; done
echo ""
fi
if [ $STALE_COUNT -eq 0 ] && [ $UNREACHABLE_COUNT -eq 0 ]; then
echo "All ${#SLUGS[@]} staging tenants returned matching SHA. ✓"
fi
} >> "$GITHUB_STEP_SUMMARY"
if [ $UNREACHABLE_COUNT -gt 0 ]; then
echo "::warning::$UNREACHABLE_COUNT staging tenant(s) unreachable post-redeploy. Likely benign teardown race — CP healthz monitor catches real outages."
fi
# Belt-and-suspenders sanity floor: if MORE than half the fleet is
# unreachable AND the fleet is large enough that "half down" is
# statistically meaningful, this is a real outage (e.g. new image
# crashes on startup), not a teardown race. Hard-fail.
#
# Floor only applies when TOTAL_VERIFIED >= 4 — below that, the
# canary-verify step is the actual gate for "all tenants down"
# detection (it runs against the canary first and aborts the
# rollout if the canary fails to come up). Without the >=4 gate,
# a 1-tenant fleet (e.g. a single ephemeral e2e-* tenant on a
# quiet staging push) would re-flake on the exact teardown-race
# condition #2402 fixed: 1 of 1 unreachable = 100% > 50% → fail.
TOTAL_VERIFIED=${#SLUGS[@]}
if [ $TOTAL_VERIFIED -ge 4 ] && [ $UNREACHABLE_COUNT -gt $((TOTAL_VERIFIED / 2)) ]; then
echo "::error::$UNREACHABLE_COUNT of $TOTAL_VERIFIED staging tenant(s) unreachable — exceeds 50% threshold on a fleet large enough that this signals a real outage, not teardown race."
exit 1
fi
if [ $STALE_COUNT -gt 0 ]; then
echo "::error::$STALE_COUNT staging tenant(s) returned a stale SHA. ssm_status=Success was misleading — see job summary."
exit 1
fi
echo "::notice::Staging tenant fleet redeploy complete — all reachable tenants on ${EXPECTED_SHA:0:7} (${UNREACHABLE_COUNT} unreachable, soft-warned)."
+12 -12
View File
@@ -57,24 +57,24 @@ See `CLAUDE.md` for a full list of environment variables and their purposes.
This repo is scoped to **code** (canvas, workspace, workspace-server, related
infra). Public content (blog posts, marketing copy, OG images, SEO briefs,
DevRel demos) lives in [`Molecule-AI/docs`](https://git.moleculesai.app/molecule-ai/docs).
DevRel demos) lives in [`molecule-ai/docs`](https://git.moleculesai.app/molecule-ai/docs).
The `Block forbidden paths` CI gate fails any PR that writes to `marketing/`
or other removed paths — open against `Molecule-AI/docs` instead.
or other removed paths — open against `molecule-ai/docs` instead.
| Content type | Target |
|---|---|
| Blog posts | `Molecule-AI/docs``content/blog/<YYYY-MM-DD-slug>/` |
| Doc pages | `Molecule-AI/docs``content/docs/` |
| Marketing copy / PMM positioning | `Molecule-AI/docs``marketing/` |
| OG images, visual assets | `Molecule-AI/docs``app/` or `marketing/` |
| SEO briefs | `Molecule-AI/docs``marketing/` |
| DevRel demos (runnable code) | Standalone repo under `Molecule-AI/`, OR embedded in `Molecule-AI/docs` |
| Blog posts | `molecule-ai/docs``content/blog/<YYYY-MM-DD-slug>/` |
| Doc pages | `molecule-ai/docs``content/docs/` |
| Marketing copy / PMM positioning | `molecule-ai/docs``marketing/` |
| OG images, visual assets | `molecule-ai/docs``app/` or `marketing/` |
| SEO briefs | `molecule-ai/docs``marketing/` |
| DevRel demos (runnable code) | Standalone repo under `molecule-ai/`, OR embedded in `molecule-ai/docs` |
| Launch checklists, internal tracking | GitHub Issues — **not** committed files |
| Engineering docs (`docs/adr/`, `docs/architecture/`, `docs/incidents/`) | This repo (internal, not published) |
| Live product pages (e.g. `canvas/src/app/pricing/page.tsx`) | This repo (these are app code, not marketing copy) |
If a PR fails the `Block forbidden paths` check, the contents belong in
`Molecule-AI/docs`. No CI drag, no Canvas E2E, content lands in minutes.
`molecule-ai/docs`. No CI drag, no Canvas E2E, content lands in minutes.
## Development Workflow
@@ -190,9 +190,9 @@ Runs the full regression suite against a fixture HTTP server. No network access
Code in this repo lands in molecule-core. Some related runtime artifacts
live in their own repos:
- [`Molecule-AI/molecule-ai-workspace-runtime`](https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-runtime) — Python adapter SDK (`molecule_runtime`) that runs inside containerized Molecule workspaces. Bridges Claude Code SDK / hermes / langgraph / etc. → A2A queue.
- [`Molecule-AI/molecule-sdk-python`](https://git.moleculesai.app/molecule-ai/molecule-sdk-python) — `A2AServer` + `RemoteAgentClient` for external agents that register over the public `/registry/register` flow.
- [`Molecule-AI/molecule-mcp-claude-channel`](https://git.moleculesai.app/molecule-ai/molecule-mcp-claude-channel) — Claude Code channel plugin. Bridges A2A traffic into a running Claude Code session via MCP `notifications/claude/channel`. Polling-based (no tunnel required); install with `claude --channels plugin:molecule@Molecule-AI/molecule-mcp-claude-channel`.
- [`molecule-ai/molecule-ai-workspace-runtime`](https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-runtime) — Python adapter SDK (`molecule_runtime`) that runs inside containerized Molecule workspaces. Bridges Claude Code SDK / hermes / langgraph / etc. → A2A queue.
- [`molecule-ai/molecule-sdk-python`](https://git.moleculesai.app/molecule-ai/molecule-sdk-python) — `A2AServer` + `RemoteAgentClient` for external agents that register over the public `/registry/register` flow.
- [`molecule-ai/molecule-mcp-claude-channel`](https://git.moleculesai.app/molecule-ai/molecule-mcp-claude-channel) — Claude Code channel plugin. Bridges A2A traffic into a running Claude Code session via MCP `notifications/claude/channel`. Polling-based (no tunnel required); install inside Claude Code via `/plugin marketplace add https://git.moleculesai.app/molecule-ai/molecule-mcp-claude-channel.git``/plugin install molecule@molecule-channel`, then launch with `claude --dangerously-load-development-channels --channels plugin:molecule@molecule-channel`.
When extending the **A2A surface** in molecule-core (`workspace-server/internal/handlers/a2a_proxy.go` etc.), consider whether the change has a downstream impact on the runtime SDK or the channel plugin — they're versioned independently but share the wire shape.
+12 -2
View File
@@ -4,10 +4,10 @@
# use this Makefile; CI calls docker compose / go test directly so the
# Makefile can evolve without breaking the build.
.PHONY: help dev up down logs build test
.PHONY: help dev up down logs build test e2e-peer-visibility
help: ## Show this help.
@grep -E '^[a-zA-Z_-]+:.*?## ' $(MAKEFILE_LIST) | awk 'BEGIN {FS = ":.*?## "}; {printf "\033[36m%-12s\033[0m %s\n", $$1, $$2}'
@grep -E '^[a-zA-Z0-9_-]+:.*?## ' $(MAKEFILE_LIST) | awk 'BEGIN {FS = ":.*?## "}; {printf "\033[36m%-22s\033[0m %s\n", $$1, $$2}'
dev: ## Start the full stack with air hot-reload for the platform service.
docker compose -f docker-compose.yml -f docker-compose.dev.yml up
@@ -26,3 +26,13 @@ build: ## Force a fresh build of the platform image (no cache).
test: ## Run Go unit tests in workspace-server/.
cd workspace-server && go test -race ./...
# ─── Local prod-mimic E2E gates ────────────────────────────────────────
# Run the LITERAL peer-visibility MCP list_peers gate against the
# already-running local stack (`make up` or `make dev`). Same byte-
# identical assertion as the staging gate — only provisioning differs.
# Skips any runtime whose provider key is absent (partially-keyed env
# is fine). See tests/e2e/test_peer_visibility_mcp_local.sh for the
# env contract (CLAUDE_CODE_OAUTH_TOKEN / E2E_MINIMAX_API_KEY / etc).
e2e-peer-visibility: ## Run the LOCAL peer-visibility MCP gate vs the running stack (needs `make up` first).
bash tests/e2e/test_peer_visibility_mcp_local.sh
+1 -1
View File
@@ -238,7 +238,7 @@ The result is not just “an agent that learns.” It is **an organization that
- subscribe to one or more workspaces; peer messages surface as conversation turns; replies route back through Molecule's A2A
- no tunnel, no public endpoint — the plugin self-registers each watched workspace as `delivery_mode=poll` and long-polls `/activity?since_id=…`
- multi-tenant friendly: one plugin install can watch workspaces across multiple Molecule tenants (`MOLECULE_PLATFORM_URLS` per-workspace)
- install via the standard marketplace flow: `/plugin marketplace add Molecule-AI/molecule-mcp-claude-channel``/plugin install molecule-channel@molecule-mcp-claude-channel`
- install via the standard marketplace flow: `/plugin marketplace add https://git.moleculesai.app/molecule-ai/molecule-mcp-claude-channel.git``/plugin install molecule@molecule-channel`, then launch with `claude --dangerously-load-development-channels --channels plugin:molecule@molecule-channel`
## Built For Teams That Need More Than A Demo
+1 -1
View File
@@ -237,7 +237,7 @@ Molecule AI 并不是要替代下面这些 framework,而是把它们纳入更
- 订阅一个或多个 workspacepeer 的消息会以 user-turn 出现,回复会经 Molecule A2A 路由出去
- 无需公网隧道、无需公开端点 —— 插件启动时自动把每个 watched workspace 注册成 `delivery_mode=poll`,长轮询 `/activity?since_id=…`
- 多租户友好:单次安装即可同时 watch 跨多个 Molecule 租户的 workspace`MOLECULE_PLATFORM_URLS` 按 workspace 配置)
- 通过标准 marketplace 流程安装:`/plugin marketplace add Molecule-AI/molecule-mcp-claude-channel``/plugin install molecule-channel@molecule-mcp-claude-channel`
- 通过标准 marketplace 流程安装:`/plugin marketplace add https://git.moleculesai.app/molecule-ai/molecule-mcp-claude-channel.git``/plugin install molecule@molecule-channel`,然后用 `claude --dangerously-load-development-channels --channels plugin:molecule@molecule-channel` 启动
## 适合什么团队
+113
View File
@@ -0,0 +1,113 @@
import { describe, it, expect, vi } from "vitest";
// Marketing-launch SEO (mc#1486). These tests pin the public crawler
// contract: anything that flips public marketing routes to disallow,
// drops the sitemap from robots.txt, or removes the OG image
// reference from root metadata should fail loudly here.
// next/font and the rest of the layout's runtime tree are not
// vitest-compatible (next/font expects the Next.js compiler swc
// transform). We import layout.tsx only for its exported `metadata`
// constant — mock the font module to a constructor-returning stub.
vi.mock("next/font/google", () => ({
Inter: () => ({ variable: "--font-inter" }),
JetBrains_Mono: () => ({ variable: "--font-jetbrains" }),
}));
import robots from "../robots";
import sitemap from "../sitemap";
import { metadata } from "../layout";
describe("robots.ts", () => {
it("allows public marketing routes and blocks authed/app routes", () => {
const r = robots();
expect(r.rules).toBeDefined();
const rule = Array.isArray(r.rules) ? r.rules[0] : r.rules!;
expect(rule.userAgent).toBe("*");
const allow = Array.isArray(rule.allow) ? rule.allow : [rule.allow];
expect(allow).toEqual(expect.arrayContaining(["/", "/pricing", "/blog"]));
const disallow = Array.isArray(rule.disallow)
? rule.disallow
: [rule.disallow];
expect(disallow).toEqual(
expect.arrayContaining(["/api/", "/orgs", "/cp/"]),
);
});
it("declares the sitemap URL", () => {
const r = robots();
expect(r.sitemap).toMatch(/\/sitemap\.xml$/);
});
it("declares a canonical host", () => {
const r = robots();
expect(r.host).toMatch(/^https:\/\//);
});
});
describe("sitemap.ts", () => {
it("includes apex, pricing, and the live blog post", () => {
const entries = sitemap();
const urls = entries.map((e) => e.url);
expect(urls.some((u) => u.endsWith("/"))).toBe(true);
expect(urls.some((u) => u.endsWith("/pricing"))).toBe(true);
expect(
urls.some((u) => u.includes("/blog/2026-04-20-chrome-devtools-mcp")),
).toBe(true);
});
it("does NOT include authed/app routes", () => {
const entries = sitemap();
const urls = entries.map((e) => e.url);
expect(urls.some((u) => u.includes("/orgs"))).toBe(false);
expect(urls.some((u) => u.includes("/api/"))).toBe(false);
});
it("sets a non-zero priority and a valid changeFrequency on every entry", () => {
const valid = new Set([
"always",
"hourly",
"daily",
"weekly",
"monthly",
"yearly",
"never",
]);
for (const e of sitemap()) {
expect(e.priority).toBeGreaterThan(0);
expect(valid.has(String(e.changeFrequency))).toBe(true);
}
});
});
describe("root layout metadata", () => {
it("sets a templated title + non-empty description", () => {
const t = metadata.title as { default: string; template: string };
expect(t.default).toMatch(/Molecule AI/);
expect(t.template).toMatch(/%s/);
expect((metadata.description ?? "").length).toBeGreaterThan(50);
});
it("declares OG + Twitter text fields (image comes from opengraph-image.tsx)", () => {
const og = metadata.openGraph;
expect(og).toBeDefined();
expect((og as { title: string }).title).toMatch(/Molecule AI/);
expect((og as { description: string }).description.length).toBeGreaterThan(
50,
);
const tw = metadata.twitter;
expect(tw).toBeDefined();
// Next.js typings narrow twitter.card to a union — assert via cast.
expect((tw as { card: string }).card).toBe("summary_large_image");
});
it("sets a canonical alternate", () => {
expect(metadata.alternates?.canonical).toBe("/");
});
it("enables indexing at the metadata level (robots.ts owns per-route)", () => {
const r = metadata.robots as { index: boolean; follow: boolean };
expect(r.index).toBe(true);
expect(r.follow).toBe(true);
});
});
+140 -2
View File
@@ -27,9 +27,78 @@ import {
themeBootScript,
} from "@/lib/theme-cookie";
// Marketing-launch SEO (mc#1486). Canonical apex is app.moleculesai.app —
// tenant subdomains (<slug>.moleculesai.app) reuse the same Next.js build
// but are gated behind auth (AuthGate redirects anonymous → /cp/auth/login)
// and are de-indexed in robots.ts. The metadata here applies to the
// public marketing surface served from the apex host.
//
// Override per-route by exporting a page-level `metadata`/`generateMetadata`
// — Next.js merges page metadata over layout metadata using
// `title.template` for "<page> | Molecule AI" composition.
const SITE_URL =
process.env.NEXT_PUBLIC_SITE_URL ?? "https://app.moleculesai.app";
export const metadata: Metadata = {
title: "Molecule AI",
description: "AI Org Chart Canvas",
metadataBase: new URL(SITE_URL),
title: {
default: "Molecule AI — the AI org chart canvas",
template: "%s | Molecule AI",
},
description:
"Molecule AI is an org-chart canvas for AI agent teams. Wire Claude Code, Codex, Hermes, and OpenClaw agents into a governed multi-agent workspace with credit metering, audit, and one-click runtime provisioning.",
applicationName: "Molecule AI",
keywords: [
"AI agents",
"multi-agent",
"agent orchestration",
"AI org chart",
"Claude Code",
"Codex",
"MCP",
"agent governance",
"A2A",
"agent runtime",
],
authors: [{ name: "Molecule AI" }],
creator: "Molecule AI",
publisher: "Molecule AI",
alternates: { canonical: "/" },
// OG + Twitter images come from the file-convention sibling
// `opengraph-image.tsx` — Next.js auto-attaches them to og:image
// and twitter:image when present at the segment root. We keep the
// text fields here so they win over per-page metadata when a page
// doesn't override them. `images: []` as the structural fallback
// for hosts that won't follow the file convention; the real URL
// is injected by Next.js at build time from opengraph-image.tsx.
openGraph: {
type: "website",
siteName: "Molecule AI",
url: SITE_URL,
title: "Molecule AI — the AI org chart canvas",
description:
"Wire Claude Code, Codex, Hermes, and OpenClaw agents into a governed multi-agent workspace. Credit metering, audit, and one-click runtime provisioning.",
locale: "en_US",
},
twitter: {
card: "summary_large_image",
title: "Molecule AI — the AI org chart canvas",
description:
"Wire Claude Code, Codex, Hermes, and OpenClaw agents into a governed multi-agent workspace.",
},
icons: {
icon: "/molecule-icon.png",
apple: "/molecule-icon.png",
},
// robots.ts owns the per-route allow/disallow contract; this is the
// header-level fallback for routes the crawler reaches before
// robots.txt resolves. Default = index public marketing routes;
// app/auth/api/orgs are noindex'd by robots.ts.
robots: {
index: true,
follow: true,
googleBot: { index: true, follow: true, "max-image-preview": "large" },
},
};
export default async function RootLayout({
@@ -94,6 +163,75 @@ export default async function RootLayout({
nonce={nonce}
dangerouslySetInnerHTML={{ __html: themeBootScript }}
/>
{/*
* JSON-LD structured data (mc#1486). Two graph nodes:
*
* - Organization: surfaces the brand to Google Knowledge
* Graph + Bing entity index. URL+logo+sameAs are the
* minimum recommended set for new brands without a
* Wikipedia page.
*
* - WebSite: enables the sitelinks search box and tells
* crawlers the canonical site URL when the same content
* is reachable via multiple subdomains (apex + tenant).
*
* Type-application/ld+json runs synchronously without
* executing JS, so 'strict-dynamic' isn't required — we still
* carry the nonce because production CSP's default-src 'self'
* applies to any <script> element. The "type" attribute is
* what keeps the browser from running the body as JS, but
* CSP nonces are gated on the element not the type, so we
* include the nonce too.
*/}
<script
type="application/ld+json"
nonce={nonce}
dangerouslySetInnerHTML={{
__html: JSON.stringify({
"@context": "https://schema.org",
"@graph": [
{
"@type": "Organization",
"@id": `${SITE_URL}#organization`,
name: "Molecule AI",
url: SITE_URL,
logo: `${SITE_URL}/molecule-icon.png`,
sameAs: [
"https://github.com/molecule-ai",
"https://x.com/moleculeai",
],
},
{
"@type": "WebSite",
"@id": `${SITE_URL}#website`,
url: SITE_URL,
name: "Molecule AI",
publisher: { "@id": `${SITE_URL}#organization` },
inLanguage: "en-US",
},
{
"@type": "SoftwareApplication",
"@id": `${SITE_URL}#software`,
name: "Molecule AI",
applicationCategory: "DeveloperApplication",
operatingSystem: "Web",
description:
"Org-chart canvas for AI agent teams with credit metering, audit, and one-click runtime provisioning.",
url: SITE_URL,
offers: {
"@type": "AggregateOffer",
priceCurrency: "USD",
lowPrice: "0",
highPrice: "99",
offerCount: "3",
url: `${SITE_URL}/pricing`,
},
publisher: { "@id": `${SITE_URL}#organization` },
},
],
}),
}}
/>
</head>
<body className={`bg-surface text-ink ${interFont.variable} ${monoFont.variable}`}>
<ThemeProvider initialTheme={theme}>
+82
View File
@@ -0,0 +1,82 @@
import { ImageResponse } from "next/og";
// Marketing-launch SEO (mc#1486). Next.js App-Router file-system OG
// convention: served as `/opengraph-image` and auto-attached as
// `og:image` + `twitter:image`. Dynamic (not a static PNG in /public)
// so we can iterate the brand mark + tagline pre-launch without
// churning a binary blob in git history.
export const runtime = "edge";
export const alt = "Molecule AI — the AI org chart canvas";
export const size = { width: 1200, height: 630 };
export const contentType = "image/png";
export default function OG() {
return new ImageResponse(
(
<div
style={{
width: "100%",
height: "100%",
display: "flex",
flexDirection: "column",
alignItems: "flex-start",
justifyContent: "center",
padding: "80px",
background:
"linear-gradient(135deg, #0a0a0a 0%, #1a1a2e 60%, #16213e 100%)",
color: "#ffffff",
fontFamily: "system-ui, -apple-system, sans-serif",
}}
>
<div
style={{
fontSize: 28,
color: "#a3a3c2",
letterSpacing: "0.18em",
textTransform: "uppercase",
marginBottom: 24,
}}
>
Molecule AI
</div>
<div
style={{
fontSize: 76,
fontWeight: 700,
lineHeight: 1.05,
letterSpacing: "-0.02em",
maxWidth: 980,
}}
>
The AI org chart canvas
</div>
<div
style={{
fontSize: 32,
color: "#c8c8d8",
marginTop: 32,
lineHeight: 1.3,
maxWidth: 980,
}}
>
Wire Claude Code, Codex, Hermes, and OpenClaw agents into a governed
multi-agent workspace.
</div>
<div
style={{
position: "absolute",
right: 80,
bottom: 80,
fontSize: 22,
color: "#7a7a96",
display: "flex",
}}
>
moleculesai.app
</div>
</div>
),
{ ...size },
);
}
+45
View File
@@ -0,0 +1,45 @@
import type { MetadataRoute } from "next";
// Marketing-launch SEO (mc#1486). Next.js App-Router robots convention:
// this file is served as `/robots.txt` at build time and is the single
// source of truth for crawler allow/disallow.
//
// Contract:
// - Public marketing routes (/, /pricing, /blog/*) are crawlable.
// - Authed/app routes (/orgs, /api/*) are noindex'd. They render
// useful content only after a session round-trip, so a crawler hit
// just wastes our crawl budget and exposes endpoint shapes.
// - Tenant subdomains (<slug>.moleculesai.app) share this build but
// are blocked at the host level by the canvas middleware sending
// an `X-Robots-Tag: noindex` header — robots.txt is per-host and
// this file's `host` field claims the apex as canonical.
//
// Note: `sitemap` is published via the sibling `sitemap.ts` route; we
// reference it explicitly here so crawlers don't have to guess.
const SITE_URL =
process.env.NEXT_PUBLIC_SITE_URL ?? "https://app.moleculesai.app";
export default function robots(): MetadataRoute.Robots {
return {
rules: [
{
userAgent: "*",
allow: ["/", "/pricing", "/blog"],
// Authed app surface + API + transient checkout returns. The
// /orgs route boots the org-selector behind AuthGate; even
// though SSR returns markup, that markup is a login wall when
// hit by an unauthenticated crawler, so indexing it dilutes
// brand searches with a "Please sign in" snippet.
disallow: [
"/orgs",
"/orgs/",
"/api/",
"/cp/",
"/checkout/",
],
},
],
sitemap: `${SITE_URL}/sitemap.xml`,
host: SITE_URL,
};
}
+42
View File
@@ -0,0 +1,42 @@
import type { MetadataRoute } from "next";
// Marketing-launch SEO (mc#1486). App-Router sitemap convention: this
// file is served as `/sitemap.xml` and enumerates the public marketing
// surface for search crawlers + AI training pipelines.
//
// Scope deliberately narrow:
// - Apex landing, pricing, and the (currently single) blog post.
// - Authed app routes are excluded — they're disallowed in robots.ts
// and would appear as "Please sign in" wall to a crawler.
//
// `lastModified` uses a build-time timestamp rather than per-route
// fs.stat so the same value applies regardless of where the build
// runs (Vercel/Railway/local). When we add CMS-backed blog content,
// swap to a per-entry timestamp from the source-of-truth metadata.
const SITE_URL =
process.env.NEXT_PUBLIC_SITE_URL ?? "https://app.moleculesai.app";
const BUILD_DATE = new Date();
export default function sitemap(): MetadataRoute.Sitemap {
return [
{
url: `${SITE_URL}/`,
lastModified: BUILD_DATE,
changeFrequency: "weekly",
priority: 1.0,
},
{
url: `${SITE_URL}/pricing`,
lastModified: BUILD_DATE,
changeFrequency: "weekly",
priority: 0.9,
},
{
url: `${SITE_URL}/blog/2026-04-20-chrome-devtools-mcp`,
lastModified: new Date("2026-04-20"),
changeFrequency: "monthly",
priority: 0.6,
},
];
}
+14 -16
View File
@@ -10,6 +10,7 @@ import { downloadChatFile, isPlatformAttachment } from "./chat/uploads";
import { PendingAttachmentPill } from "./chat/AttachmentViews";
import { AttachmentPreview } from "./chat/AttachmentPreview";
import { AgentCommsPanel } from "./chat/AgentCommsPanel";
import { ChatErrorBanner } from "./chat/ChatErrorBanner";
import { appendActivityLine } from "./chat/activityLog";
import { runtimeDisplayName } from "@/lib/runtime-names";
import { ConfirmDialog } from "@/components/ConfirmDialog";
@@ -592,22 +593,19 @@ function MyChatPanel({ workspaceId, data }: Props) {
<div ref={bottomRef} />
</div>
{/* Error banner */}
{displayError && (
<div className="px-3 py-2 bg-red-900/20 border-t border-red-800/30">
<div className="flex items-center justify-between">
<span className="text-[10px] text-red-300">{displayError}</span>
{!isOnline && (
<button
onClick={() => setConfirmRestart(true)}
className="text-[11px] px-2 py-0.5 bg-red-800 text-red-200 rounded hover:bg-red-700"
>
Restart
</button>
)}
</div>
</div>
)}
{/* Error banner — internal#212: surfaces the secret-safe
actionable failure reason that ws-server places on
ACTIVITY_LOGGED.error_detail (propagated via
useChatSocket → onSendError → setError) and offers a
"View activity log" affordance that navigates the user to
the Activity tab where the full row lives. The previous
inline JSX hardcoded "see workspace logs for details" with
no link — there is no separate Logs tab. */}
<ChatErrorBanner
message={displayError}
isOnline={isOnline}
onRestart={() => setConfirmRestart(true)}
/>
{/* Input */}
<div className="p-3 border-t border-line">
@@ -0,0 +1,99 @@
// @vitest-environment jsdom
//
// Pins internal#212 — the chat error banner must:
//
// 1. Render the secret-safe failure reason (e.g. the provider's own
// "403 oauth_org_not_allowed: ..." string), NOT the opaque
// hardcoded "Agent error (Exception) — see workspace logs for
// details." that points at a workspace-logs tab that doesn't
// exist.
//
// 2. Offer a working "View activity log" affordance that navigates
// the user to the Activity tab where the full row lives.
//
// Tested at the banner-component seam (ChatErrorBanner). The
// hook-level path is pinned separately by
// chat/hooks/__tests__/useChatSocket.test.tsx — together they cover
// wire-payload → callback → render without each test needing to drive
// the full ChatTab send-state machinery.
import { describe, it, expect, vi, afterEach, beforeEach } from "vitest";
import { render, screen, cleanup, fireEvent } from "@testing-library/react";
afterEach(cleanup);
const mocks = vi.hoisted(() => ({
setPanelTabMock: vi.fn(),
}));
vi.mock("@/store/canvas", () => {
const state = {
setPanelTab: mocks.setPanelTabMock,
panelTab: "chat",
};
const hook = (selector?: (s: typeof state) => unknown) =>
selector ? selector(state) : state;
hook.getState = () => state;
return { useCanvasStore: hook };
});
beforeEach(() => {
mocks.setPanelTabMock.mockClear();
});
import { ChatErrorBanner } from "../chat/ChatErrorBanner";
describe("ChatErrorBanner — surfaces actionable reason (internal#212)", () => {
it("renders the secret-safe failure reason verbatim, not a hardcoded opaque message", () => {
const reason =
"Anthropic 403 oauth_org_not_allowed: Your organization has disabled Claude subscription access for Claude Code — use an Anthropic API key or ask your admin to enable access.";
render(<ChatErrorBanner message={reason} isOnline={true} onRestart={() => {}} />);
expect(screen.getByText(/oauth_org_not_allowed/i)).toBeDefined();
expect(screen.getByText(/disabled Claude subscription access/i)).toBeDefined();
// The legacy boilerplate must NOT leak through when a real reason
// is provided.
expect(screen.queryByText(/see workspace logs for details/i)).toBeNull();
});
it("falls back to the message when it IS the legacy boilerplate (older ws-server)", () => {
// Graceful degradation: an older ws-server passes through the
// hardcoded text; the banner still renders SOMETHING — never
// silently swallow.
render(
<ChatErrorBanner
message="Agent error (Exception) — see workspace logs for details."
isOnline={true}
onRestart={() => {}}
/>,
);
expect(
screen.getByText(/Agent error \(Exception\) — see workspace logs for details\./),
).toBeDefined();
});
it("offers a 'View activity log' button that calls setPanelTab('activity')", () => {
render(
<ChatErrorBanner message="kimi 401 invalid_api_key" isOnline={true} onRestart={() => {}} />,
);
const btn = screen.getByRole("button", { name: /view activity log/i });
fireEvent.click(btn);
expect(mocks.setPanelTabMock).toHaveBeenCalledWith("activity");
});
it("still shows the Restart button when offline (existing behavior preserved)", () => {
const onRestart = vi.fn();
render(
<ChatErrorBanner message="Agent is offline" isOnline={false} onRestart={onRestart} />,
);
const btn = screen.getByRole("button", { name: /^restart$/i });
fireEvent.click(btn);
expect(onRestart).toHaveBeenCalledTimes(1);
});
it("renders nothing when message is null", () => {
const { container } = render(
<ChatErrorBanner message={null} isOnline={true} onRestart={() => {}} />,
);
expect(container.textContent).toBe("");
});
});
@@ -0,0 +1,85 @@
"use client";
/**
* ChatErrorBanner — error-state banner rendered under the chat
* message list when an agent turn fails or the workspace is offline.
*
* internal#212 closes the "see workspace logs for details" pointer-to-
* nowhere defect:
*
* - The banner now renders the actionable, secret-safe failure
* reason that ws-server places on `ACTIVITY_LOGGED.error_detail`
* (provider HTTP status + error code + provider's own human
* message). The hook (`useChatSocket`) forwards this through
* `onSendError`, which the ChatTab routes into this banner's
* `message` prop. No hardcoded opaque text in this component.
*
* - A "View activity log" button navigates the user to the Activity
* tab where the full row (request body, response body, timing,
* full error_detail) lives. Until internal#212, the banner
* mentioned "workspace logs" with no link — there is no separate
* Logs tab in the side panel; the Activity tab IS the workspace-
* logs surface. Routing through the existing tab makes the
* reference real instead of dangling.
*
* - The existing Restart button (shown only when the workspace is
* offline) is preserved unchanged so the recovery affordance the
* old banner offered does not regress.
*
* Pure presentational — no socket subscription, no state machine. Easy
* to unit-test in isolation and easy to compose into the ChatTab.
*/
import { useCanvasStore } from "@/store/canvas";
export interface ChatErrorBannerProps {
/** The user-visible reason. Pass `null` to render nothing. */
message: string | null;
/** Workspace reachable state — gates the Restart affordance. */
isOnline: boolean;
/** Fires when the user clicks Restart (offline-only). */
onRestart: () => void;
}
export function ChatErrorBanner({ message, isOnline, onRestart }: ChatErrorBannerProps) {
// Pulled from the global store rather than threaded through props so
// the chat tab does not need to know about the side-panel tab state.
// Matches how Toolbar.tsx triggers the audit tab (the existing
// precedent for cross-tab navigation).
const setPanelTab = useCanvasStore((s) => s.setPanelTab);
if (!message) return null;
return (
<div
// role="alert" + aria-live mirrors the project's existing WCAG
// 4.1.3 banner pattern (see fix/canvas-errors-aria-alert) — a
// screen reader announces the failure as soon as it lands.
role="alert"
aria-live="assertive"
className="px-3 py-2 bg-red-900/20 border-t border-red-800/30"
>
<div className="flex items-center justify-between gap-2">
<span className="text-[10px] text-red-300 break-words flex-1">{message}</span>
<div className="flex items-center gap-1.5 shrink-0">
<button
type="button"
onClick={() => setPanelTab("activity")}
className="text-[10px] px-2 py-0.5 bg-red-900/40 hover:bg-red-800/60 border border-red-700/40 text-red-200 rounded transition-colors focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-accent focus-visible:ring-offset-1"
>
View activity log
</button>
{!isOnline && (
<button
type="button"
onClick={onRestart}
className="text-[11px] px-2 py-0.5 bg-red-800 text-red-200 rounded hover:bg-red-700"
>
Restart
</button>
)}
</div>
</div>
</div>
);
}
@@ -0,0 +1,140 @@
// @vitest-environment jsdom
import { describe, it, expect, vi, beforeEach, afterEach } from "vitest";
import { renderHook, act } from "@testing-library/react";
// Capture the handler so we can drive WS events from tests. useSocketEvent
// stores the latest handler in a ref under the hood, but since we mock
// the hook entirely, just remember the last passed-in handler.
let capturedHandler: ((msg: unknown) => void) | null = null;
vi.mock("@/hooks/useSocketEvent", () => ({
useSocketEvent: (h: (msg: unknown) => void) => {
capturedHandler = h;
},
}));
// Canvas store mock — useChatSocket calls
// useCanvasStore.getState().nodes for peer name resolution and reads
// agentMessages via the selector form. Support both.
vi.mock("@/store/canvas", () => {
const state = {
nodes: [
{ id: "ws-self", data: { name: "Self" } },
{ id: "ws-peer", data: { name: "Peer Agent" } },
],
agentMessages: {} as Record<string, unknown[]>,
consumeAgentMessages: () => [],
};
const hook = (selector?: (s: typeof state) => unknown) =>
selector ? selector(state) : state;
hook.getState = () => state;
return { useCanvasStore: hook };
});
import { useChatSocket } from "../useChatSocket";
beforeEach(() => {
capturedHandler = null;
});
afterEach(() => {
vi.clearAllMocks();
});
// Helper: assemble an ACTIVITY_LOGGED a2a_receive error event the way
// the ws-server emits one when a peer call errors out. Fields mirror
// workspace-server/internal/handlers/activity.go::logActivityExec
// broadcast payload shape.
function makeActivityErrorEvent(opts: { workspaceId: string; targetId?: string; errorDetail?: string | undefined }) {
return {
event: "ACTIVITY_LOGGED",
workspace_id: opts.workspaceId,
payload: {
activity_type: "a2a_receive",
method: "message/send",
status: "error",
target_id: opts.targetId ?? opts.workspaceId,
duration_ms: 1500,
...(opts.errorDetail !== undefined ? { error_detail: opts.errorDetail } : {}),
},
timestamp: "2026-05-18T00:00:00Z",
};
}
describe("useChatSocket — surface error_detail to onSendError (internal#212)", () => {
it("forwards the secret-safe error_detail from the broadcast as the onSendError reason", () => {
const onSendError = vi.fn();
const onSendComplete = vi.fn();
renderHook(() =>
useChatSocket("ws-self", {
onSendError,
onSendComplete,
}),
);
expect(capturedHandler).not.toBeNull();
act(() => {
capturedHandler!(
makeActivityErrorEvent({
workspaceId: "ws-self",
errorDetail:
"Anthropic 403 oauth_org_not_allowed: Your organization has disabled Claude subscription access for Claude Code",
}),
);
});
// The hook must NOT fall back to the opaque hardcoded
// "Agent error (Exception) — see workspace logs for details." —
// that was internal#212. When the broadcast carries an
// error_detail, that string is the user-facing reason.
expect(onSendError).toHaveBeenCalledTimes(1);
const reason = onSendError.mock.calls[0][0] as string;
expect(reason).toContain("403");
expect(reason).toContain("oauth_org_not_allowed");
expect(reason).toContain("disabled Claude subscription");
expect(reason).not.toMatch(/see workspace logs for details/i);
});
it("gracefully degrades to the legacy opaque message when error_detail is absent (older ws-server)", () => {
// An older ws-server doesn't include error_detail in the payload.
// The hook must still fire onSendError with the legacy hardcoded
// text so the chat banner has SOMETHING to show. The fix is
// additive — never depend on the new field's presence.
const onSendError = vi.fn();
renderHook(() =>
useChatSocket("ws-self", {
onSendError,
}),
);
act(() => {
capturedHandler!(makeActivityErrorEvent({ workspaceId: "ws-self" }));
});
expect(onSendError).toHaveBeenCalledTimes(1);
const reason = onSendError.mock.calls[0][0] as string;
// Legacy boilerplate is the floor — never silently swallow.
expect(reason.length).toBeGreaterThan(0);
});
it("ignores errors targeted at a different workspace's peer", () => {
// Defense against a race where the WS hub fans out to all clients —
// each chat panel must only react when target_id matches its own
// workspace.
const onSendError = vi.fn();
renderHook(() =>
useChatSocket("ws-self", {
onSendError,
}),
);
act(() => {
capturedHandler!(
makeActivityErrorEvent({
workspaceId: "ws-self",
targetId: "ws-someone-else",
errorDetail: "irrelevant",
}),
);
});
expect(onSendError).not.toHaveBeenCalled();
});
});
@@ -67,9 +67,23 @@ export function useChatSocket(
const own = (targetId || msg.workspace_id) === workspaceId;
if (own) {
callbacksRef.current.onSendComplete?.();
callbacksRef.current.onSendError?.(
"Agent error (Exception) — see workspace logs for details.",
);
// internal#212 — surface the actionable, secret-safe
// failure reason (provider HTTP status + error code +
// human-readable message) the ws-server now puts on
// ACTIVITY_LOGGED.error_detail. The old hardcoded
// "Agent error (Exception) — see workspace logs for
// details." is the fallback only — it pointed at a
// workspace-logs tab that doesn't exist, telling the
// user nothing they could act on.
//
// Graceful degradation: older ws-server builds don't
// include error_detail, so the legacy boilerplate is
// still the floor (never silently swallow).
const detail = (p.error_detail as string) || "";
const reason = detail
? detail
: "Agent error (Exception) — see workspace logs for details.";
callbacksRef.current.onSendError?.(reason);
}
}
} else if (type === "a2a_send") {
+165
View File
@@ -0,0 +1,165 @@
# shellcheck shell=bash
# Shared peer-visibility assertion core — runtime/backend-AGNOSTIC.
#
# WHY THIS FILE EXISTS
# --------------------
# The peer-visibility gate (PR #1298) was staging-only. Per the standing
# rule that the local prod-mimic stack must run a MANDATORY local-Postgres
# E2E BEFORE staging E2E (memory: feedback_local_must_mimic_production,
# feedback_mandatory_local_e2e_before_ship, feedback_local_test_before_
# staging_e2e), peer-visibility must also run against the local stack.
#
# The ASSERTION must be byte-identical between local and staging — only
# provisioning differs. So the literal MCP `list_peers` call + every
# anti-proxy / anti-native-fallback guarantee lives HERE, sourced by both
# tests/e2e/test_peer_visibility_mcp_staging.sh (staging/CP backend) and
# tests/e2e/test_peer_visibility_mcp_local.sh (local docker-compose
# backend). If this assertion ever diverges between the two, that is the
# bug — keep it in one place.
#
# THIS IS NOT A PROXY. pv_assert_runtime issues the byte-for-byte
# JSON-RPC `tools/call name=list_peers` envelope to `POST
# /workspaces/:id/mcp` using the workspace's OWN bearer token, through
# the real WorkspaceAuth + MCPRateLimiter middleware chain — the exact
# call mcp_molecule_list_peers makes from a canvas agent. It does NOT
# read a registry row, /health, the heartbeat table, or
# GET /registry/:id/peers.
#
# Contract:
# pv_assert_runtime <runtime> <ws_id> <ws_bearer> <base_url> \
# <org_id_or_empty> <all_ws_ids_space_separated>
#
# <org_id_or_empty> staging: the X-Molecule-Org-Id header value.
# local: "" (the local single-tenant stack does
# not gate on the org header; the header
# is simply omitted when empty).
# <all_ws_ids> every provisioned workspace id (parent + every
# runtime sibling). The expected peer set for this
# runtime is every id in here EXCEPT <ws_id>.
#
# Sets the global PV_VERDICT to one of:
# OK
# FAIL(http=<code>)
# FAIL(native-fallback)
# FAIL(rpc=<detail>)
# FAIL(peers=<detail>)
# FAIL(unknown)
# Returns 0 when PV_VERDICT=OK, 1 otherwise. Never exits — the caller
# owns aggregation + the gate exit code (10 = regression reproduced).
#
# The literal JSON-RPC envelope. Identical to what
# workspace/platform_tools/registry.py's mcp_molecule_list_peers emits.
PV_RPC_BODY='{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{"name":"list_peers","arguments":{}}}'
pv_assert_runtime() {
local rt="$1" wid="$2" wtok="$3" base_url="$4" org_id="$5" all_ws_ids="$6"
# Expected peer set = every OTHER provisioned workspace, excluding the
# caller itself. Byte-identical selection to the original staging script.
local expect_ids
expect_ids=$(echo "$all_ws_ids" | tr ' ' '\n' | grep -v "^${wid}$" | grep -v '^$')
# X-Molecule-Org-Id only when the backend supplies one (staging multi-
# tenant). Local single-tenant omits it — the same WorkspaceAuth +
# MCPRateLimiter chain still runs; only the tenant-routing header differs.
local org_header=()
if [ -n "$org_id" ]; then
org_header=(-H "X-Molecule-Org-Id: $org_id")
fi
local resp http_code body
set +e
resp=$(curl -sS -X POST "$base_url/workspaces/$wid/mcp" \
-H "Authorization: Bearer $wtok" \
"${org_header[@]}" \
-H "Content-Type: application/json" \
-d "$PV_RPC_BODY" \
-o /tmp/pv_mcp_body.json -w "%{http_code}" 2>/dev/null)
set -e
http_code="$resp"
body=$(cat /tmp/pv_mcp_body.json 2>/dev/null || echo '')
echo "--- $rt (ws=$wid) ---"
echo " HTTP $http_code"
echo " body: $(echo "$body" | head -c 600)"
# (1) HTTP 200 — a 401 (WorkspaceAuth reject, the Hermes symptom) fails here.
if [ "$http_code" != "200" ]; then
echo "$rt: list_peers MCP call returned HTTP $http_code (expected 200)"
PV_VERDICT="FAIL(http=$http_code)"
return 1
fi
# (2) JSON-RPC result present, not an error object; expected sibling IDs
# present; not a native-sessions fallback. Byte-identical to the
# original staging script's inline python.
local parse
parse=$(echo "$body" | python3 -c "
import sys, json
expect = set(filter(None, '''$expect_ids'''.split()))
try:
d = json.load(sys.stdin)
except Exception as e:
print('PARSE_ERROR:' + str(e)); sys.exit(0)
if isinstance(d, dict) and d.get('error') is not None:
print('RPC_ERROR:' + json.dumps(d['error'])[:200]); sys.exit(0)
res = d.get('result') if isinstance(d, dict) else None
if res is None:
print('NO_RESULT'); sys.exit(0)
# MCP tools/call result shape: {content:[{type:text,text:'<json or prose>'}]}
text = ''
if isinstance(res, dict):
for c in res.get('content', []):
if c.get('type') == 'text':
text += c.get('text', '')
text_l = text.lower()
# Native-sessions fallback signature (the OpenClaw symptom): the agent
# answered from its own runtime session list, not the platform peer set.
if 'sessions_list' in text_l or 'no platform peers' in text_l or 'native session' in text_l:
print('NATIVE_FALLBACK:' + text[:200]); sys.exit(0)
# The expected sibling IDs must literally appear in the returned peer text.
found = sorted(i for i in expect if i in text)
missing = sorted(expect - set(found))
if not expect:
print('NO_EXPECTED_PEERS_CONFIGURED'); sys.exit(0)
if missing:
print('MISSING_PEERS:found=%d/%d missing=%s' % (len(found), len(expect), ','.join(m[:8] for m in missing)))
sys.exit(0)
print('OK:found=%d/%d' % (len(found), len(expect)))
" 2>/dev/null)
case "$parse" in
OK:*)
echo "$rt: list_peers returned 200 and contains all expected peers ($parse)"
PV_VERDICT="OK"
return 0
;;
NATIVE_FALLBACK:*)
echo "$rt: list_peers fell back to NATIVE sessions — sees no platform peers ($parse)"
PV_VERDICT="FAIL(native-fallback)"
return 1
;;
RPC_ERROR:*|NO_RESULT|PARSE_ERROR:*)
echo "$rt: list_peers MCP call did not return a usable result ($parse)"
PV_VERDICT="FAIL(rpc=$parse)"
return 1
;;
MISSING_PEERS:*)
echo "$rt: list_peers returned 200 but peer set is wrong/empty ($parse)"
PV_VERDICT="FAIL(peers=$parse)"
return 1
;;
NO_EXPECTED_PEERS_CONFIGURED)
# Caller bug, not a runtime regression — surface loudly so a
# mis-wired backend can't mint a false green.
echo "$rt: no expected peers were configured for this caller"
PV_VERDICT="FAIL(rpc=NO_EXPECTED_PEERS_CONFIGURED)"
return 1
;;
*)
echo "$rt: unexpected verdict '$parse'"
PV_VERDICT="FAIL(unknown)"
return 1
;;
esac
}
+328
View File
@@ -0,0 +1,328 @@
#!/usr/bin/env bash
# LOCAL E2E — fresh-provision peer-visibility gate via the LITERAL MCP path.
#
# WHY THIS EXISTS
# ---------------
# tests/e2e/test_peer_visibility_mcp_staging.sh (PR #1298) codified the
# literal user-facing peer-visibility path — but staging-only. The
# standing rule is that the local prod-mimic stack runs a MANDATORY
# local-Postgres E2E BEFORE staging E2E (memory:
# feedback_local_must_mimic_production, feedback_mandatory_local_e2e_
# before_ship, feedback_local_test_before_staging_e2e,
# feedback_real_subprocess_test_for_boot_path). A staging-only gate means
# regressions are caught late and expensively on EC2. This is the LOCAL
# backend: same byte-identical assertion, local docker-compose stack.
#
# THE ASSERTION IS NOT A PROXY and is BYTE-IDENTICAL to staging — it is
# the SAME tests/e2e/lib/peer_visibility_assert.sh::pv_assert_runtime that
# the staging script calls. It issues the byte-for-byte JSON-RPC
# `tools/call name=list_peers` envelope to `POST /workspaces/:id/mcp`
# using each workspace's OWN bearer token, through the real WorkspaceAuth
# + MCPRateLimiter middleware chain — the exact call
# mcp_molecule_list_peers makes from a canvas agent. It does NOT read a
# registry row, /health, the heartbeat table, or GET /registry/:id/peers.
#
# Only PROVISIONING differs from staging:
# - staging: POST /cp/admin/orgs (cold EC2 tenant) + per-tenant admin
# token + each workspace's auth_token from the POST /workspaces resp.
# - local: POST /workspaces directly against the local stack
# (BASE, default http://localhost:8080), MCP bearer minted via
# GET /admin/workspaces/:id/test-token (e2e_mint_test_token —
# deterministic, gated by MOLECULE_ENV != production). Same model
# every other local E2E (test_priority_runtimes_e2e.sh,
# test_api.sh) already uses; no new credential/provision flow.
#
# It is written to FAIL on today's broken Hermes/OpenClaw behavior and go
# green only when the in-flight root-cause fixes (Hermes-401 #162,
# OpenClaw-never-online/MCP-wiring #165) actually land — same gate
# semantics + exit codes as the staging script. NON-required by design
# until then (flip-to-required tracked at molecule-core#1296), and NOT
# masked with continue-on-error (feedback_fix_root_not_symptom).
#
# Required env: none (local stack only).
# Optional env:
# BASE default http://localhost:8080
# PV_RUNTIMES space list; default "hermes openclaw claude-code"
# E2E_PROVISION_TIMEOUT_SECS per-workspace online budget; default 900
# (hermes cold apt+uv is the slow path locally)
# E2E_KEEP_WS 1 → skip teardown (local debugging only)
# LLM provider keys (a workspace boots only if its provider key is set;
# a runtime whose key is absent is SKIPPED, not failed — a partially
# keyed local env must not false-fail the gate):
# CLAUDE_CODE_OAUTH_TOKEN claude-code
# E2E_MINIMAX_API_KEY hermes/openclaw (MiniMax, preferred)
# E2E_ANTHROPIC_API_KEY hermes/openclaw (direct Anthropic)
# E2E_OPENAI_API_KEY hermes/openclaw (OpenAI)
#
# Exit codes (match the staging script):
# 0 every runtime under test saw its peers via the literal MCP call
# 1 generic failure
# 3 a workspace never reached online within the budget
# 10 peer-visibility regression reproduced (the gate firing as designed)
set -uo pipefail
source "$(dirname "$0")/_lib.sh"
# Byte-identical assertion shared with the staging backend.
# shellcheck source=tests/e2e/lib/peer_visibility_assert.sh
source "$(dirname "$0")/lib/peer_visibility_assert.sh"
PV_RUNTIMES="${PV_RUNTIMES:-hermes openclaw claude-code}"
PROVISION_TIMEOUT_SECS="${E2E_PROVISION_TIMEOUT_SECS:-900}"
NAME_PREFIX="PV-Local-$$-$(date +%H%M%S)"
log() { echo "[$(date +%H:%M:%S)] $*"; }
ok() { echo "[$(date +%H:%M:%S)] ✅ $*"; }
CREATED_WSIDS=()
# ─── Scoped teardown ───────────────────────────────────────────────────
# Deletes ONLY the workspaces THIS run created (tracked in CREATED_WSIDS),
# one DELETE /workspaces/:id?confirm=true each. NEVER e2e_cleanup_all_
# workspaces / any blanket sweep — honors feedback_cleanup_after_each_test
# and feedback_never_run_cluster_cleanup_tests_on_live_platform (a local
# stack can still be shared with other concurrent local E2E).
teardown() {
local rc=$?
set +e
if [ "${E2E_KEEP_WS:-0}" = "1" ]; then
echo ""
log "[teardown] E2E_KEEP_WS=1 — leaving ${#CREATED_WSIDS[@]} ws for debugging (REMEMBER TO DELETE)"
exit $rc
fi
echo ""
log "[teardown] deleting ${#CREATED_WSIDS[@]} workspace(s) this run created (scoped)"
for wid in ${CREATED_WSIDS[@]+"${CREATED_WSIDS[@]}"}; do
[ -n "$wid" ] || continue
curl -s -X DELETE "$BASE/workspaces/$wid?confirm=true" >/dev/null 2>&1 || true
done
exit $rc
}
trap teardown EXIT INT TERM
# Pre-sweep workspaces a prior crashed run of THIS script left behind
# (name prefix match only — never a blanket delete). The trap fires on
# normal exit, but a kill -9 / SIGPIPE can bypass it.
PRIOR=$(curl -s "$BASE/workspaces" | python3 -c '
import json, sys
try:
print(" ".join(w["id"] for w in json.load(sys.stdin) if w.get("name","").startswith("PV-Local-")))
except Exception:
pass
' 2>/dev/null)
for _wid in $PRIOR; do
log "Pre-sweeping prior PV-Local workspace: $_wid"
curl -s -X DELETE "$BASE/workspaces/$_wid?confirm=true" >/dev/null 2>&1 || true
done
# ─── Local-stack preflight ─────────────────────────────────────────────
log "0/5 local stack preflight: $BASE/health"
if ! curl -fsS "$BASE/health" -m 5 >/dev/null 2>&1; then
echo "::error::Local stack not healthy at $BASE/health — bring it up (make up) before this gate. Infra, not a workspace bug (feedback_fix_root_not_symptom)." >&2
exit 1
fi
# admin/test-token is the local MCP-bearer mint path; it 404s in
# production. If it is off, this gate cannot drive the literal call.
if ! curl -fsS "$BASE/admin/workspaces/preflight-probe/test-token" -m 5 >/dev/null 2>&1; then
# A 404 here is EITHER "no such ws" (fine — endpoint is enabled) OR the
# endpoint is disabled (MOLECULE_ENV=production). Distinguish by body.
PROBE=$(curl -s "$BASE/admin/workspaces/preflight-probe/test-token" -m 5 2>/dev/null)
if echo "$PROBE" | grep -qi 'production\|disabled\|not found.*endpoint'; then
echo "::error::GET /admin/workspaces/:id/test-token disabled (MOLECULE_ENV=production?). Cannot mint a local MCP bearer." >&2
exit 1
fi
fi
ok " local stack healthy"
# ─── Resolve per-runtime provisioning secrets ──────────────────────────
# Mirrors test_priority_runtimes_e2e.sh / test_staging_full_saas.sh's
# provider-key chain. A runtime whose key is absent is SKIPPED (not
# failed) so a partially keyed local env doesn't false-fail the gate.
runtime_secrets() {
local rt="$1"
case "$rt" in
claude-code)
[ -n "${CLAUDE_CODE_OAUTH_TOKEN:-}" ] || { echo ""; return 1; }
python3 -c "import json,os;print(json.dumps({'CLAUDE_CODE_OAUTH_TOKEN':os.environ['CLAUDE_CODE_OAUTH_TOKEN']}))"
;;
hermes|openclaw)
if [ -n "${E2E_MINIMAX_API_KEY:-}" ]; then
python3 -c "import json,os;k=os.environ['E2E_MINIMAX_API_KEY'];print(json.dumps({'ANTHROPIC_BASE_URL':'https://api.minimax.io/anthropic','ANTHROPIC_AUTH_TOKEN':k,'MINIMAX_API_KEY':k}))"
elif [ -n "${E2E_ANTHROPIC_API_KEY:-}" ]; then
python3 -c "import json,os;k=os.environ['E2E_ANTHROPIC_API_KEY'];print(json.dumps({'ANTHROPIC_API_KEY':k}))"
elif [ -n "${E2E_OPENAI_API_KEY:-}" ]; then
python3 -c "import json,os;k=os.environ['E2E_OPENAI_API_KEY'];print(json.dumps({'OPENAI_API_KEY':k,'OPENAI_BASE_URL':'https://api.openai.com/v1','MODEL_PROVIDER':'openai:gpt-4o','HERMES_INFERENCE_PROVIDER':'custom','HERMES_CUSTOM_BASE_URL':'https://api.openai.com/v1','HERMES_CUSTOM_API_KEY':k,'HERMES_CUSTOM_API_MODE':'chat_completions'}))"
else
echo ""; return 1
fi
;;
*)
# Unknown runtime: provision with empty secrets and let the stack
# decide (kept permissive so PV_RUNTIMES can be widened later).
echo "{}"
;;
esac
}
# Block until $1 reaches one of $2 (space-separated), or $3 sec elapse.
wait_for_status() {
local wsid="$1" want="$2" budget="$3" start=$SECONDS last=""
while [ $((SECONDS - start)) -lt "$budget" ]; do
local s
s=$(curl -s "$BASE/workspaces/$wsid" | python3 -c 'import json,sys
try:
d=json.load(sys.stdin); w=d.get("workspace") if isinstance(d.get("workspace"),dict) else d; print(w.get("status",""))
except Exception:
print("")' 2>/dev/null || echo "")
[ "$s" != "$last" ] && { log " $wsid${s:-<none>}"; last="$s"; }
for w in $want; do [ "$s" = "$w" ] && { echo "$s"; return 0; }; done
sleep 5
done
echo "$last"
return 1
}
# ─── 1. Provision parent (claude-code) + one sibling per runtime ───────
# Same topology as the staging script: a claude-code parent plus one
# sibling per runtime under test, so each runtime should see all others.
log "1/5 provisioning parent (claude-code) + one sibling per runtime under test..."
PARENT_SECRETS=$(runtime_secrets claude-code) || PARENT_SECRETS=""
if [ -z "$PARENT_SECRETS" ]; then
# Parent still needs to exist as a peer target even without an LLM key;
# it never has to answer list_peers itself (it is excluded from the
# caller set), so an empty-secrets claude-code shell is sufficient.
PARENT_SECRETS="{}"
fi
P_RESP=$(curl -s -X POST "$BASE/workspaces" -H "Content-Type: application/json" \
-d "{\"name\":\"${NAME_PREFIX}-parent\",\"runtime\":\"claude-code\",\"tier\":3,\"secrets\":$PARENT_SECRETS}")
PARENT_ID=$(echo "$P_RESP" | python3 -c 'import json,sys;print(json.load(sys.stdin).get("id",""))' 2>/dev/null)
if [ -z "$PARENT_ID" ]; then
echo "::error::parent create failed: $(echo "$P_RESP" | head -c 300)" >&2
exit 1
fi
CREATED_WSIDS+=("$PARENT_ID")
log " PARENT_ID=$PARENT_ID"
# NOTE: no `declare -A` — this script must also run on a local macOS dev
# box (bash 3.2, no associative arrays) per feedback_local_must_mimic_
# production. WS_IDS / VERDICT are kept as newline-delimited "rt<TAB>val"
# maps with tiny get/set helpers (portable to bash 3.2+ AND ubuntu CI).
WS_IDS_MAP=""
VERDICT_MAP=""
_map_set() { # _map_set <mapvarname> <key> <value>
local __m="$1" __k="$2" __v="$3" __cur
eval "__cur=\$$__m"
__cur=$(printf '%s' "$__cur" | grep -v "^${__k} " || true)
if [ -n "$__cur" ]; then
eval "$__m=\$(printf '%s\n%s\t%s' \"\$__cur\" \"\$__k\" \"\$__v\")"
else
eval "$__m=\$(printf '%s\t%s' \"\$__k\" \"\$__v\")"
fi
}
_map_get() { # _map_get <mapvarname> <key> -> stdout value (empty if absent)
local __m="$1" __k="$2" __cur
eval "__cur=\$$__m"
printf '%s\n' "$__cur" | awk -F'\t' -v k="$__k" '$1==k {print $2; exit}'
}
ALL_WS_IDS="$PARENT_ID"
ACTIVE_RUNTIMES=""
for rt in $PV_RUNTIMES; do
SEC=$(runtime_secrets "$rt") || SEC=""
if [ -z "$SEC" ]; then
log " SKIP $rt — no provider key in env (partially-keyed local env; not a failure)"
continue
fi
R=$(curl -s -X POST "$BASE/workspaces" -H "Content-Type: application/json" \
-d "{\"name\":\"${NAME_PREFIX}-$rt\",\"runtime\":\"$rt\",\"tier\":2,\"parent_id\":\"$PARENT_ID\",\"secrets\":$SEC}")
WID=$(echo "$R" | python3 -c 'import json,sys;print(json.load(sys.stdin).get("id",""))' 2>/dev/null)
if [ -z "$WID" ]; then
echo "::error::$rt workspace create failed: $(echo "$R" | head -c 300)" >&2
exit 1
fi
_map_set WS_IDS_MAP "$rt" "$WID"
CREATED_WSIDS+=("$WID")
ALL_WS_IDS="$ALL_WS_IDS $WID"
ACTIVE_RUNTIMES="$ACTIVE_RUNTIMES $rt"
log " $rt$WID"
done
ACTIVE_RUNTIMES="$(echo "$ACTIVE_RUNTIMES" | xargs)"
if [ -z "$ACTIVE_RUNTIMES" ]; then
echo "::error::No runtime had a provider key set — cannot run the local peer-visibility gate. Set CLAUDE_CODE_OAUTH_TOKEN and/or E2E_MINIMAX_API_KEY (or ANTHROPIC/OPENAI)." >&2
exit 1
fi
# ─── 2. Wait for the parent online (it is a peer target) ───────────────
log "2/5 waiting for parent online (peer target)..."
PF=$(wait_for_status "$PARENT_ID" "online" "$PROVISION_TIMEOUT_SECS") || true
if [ "$PF" != "online" ]; then
echo "::error::parent ($PARENT_ID) never reached online (last=$PF) within ${PROVISION_TIMEOUT_SECS}s" >&2
exit 3
fi
ok " parent online"
# ─── 3. Wait for every sibling online ──────────────────────────────────
# A runtime that never comes online locally is itself a finding: it
# reproduces the openclaw-never-online class (#165) on the local stack.
log "3/5 waiting for all siblings online (up to ${PROVISION_TIMEOUT_SECS}s each — cold boot)..."
REGRESSED=0
ONLINE_RUNTIMES=""
for rt in $ACTIVE_RUNTIMES; do
wid="$(_map_get WS_IDS_MAP "$rt")"
S=$(wait_for_status "$wid" "online" "$PROVISION_TIMEOUT_SECS") || true
if [ "$S" != "online" ]; then
echo "$rt ($wid): never reached online (last=$S) — reproduces the never-online class locally"
_map_set VERDICT_MAP "$rt" "FAIL(never-online:last=$S)"
REGRESSED=1
continue
fi
ok " $rt online"
ONLINE_RUNTIMES="$ONLINE_RUNTIMES $rt"
done
# ─── 4. THE GATE — literal mcp_molecule_list_peers via POST /:id/mcp ────
# Shared, byte-identical assertion. Local passes "" for the org id (the
# single-tenant local stack does not gate on X-Molecule-Org-Id); the
# literal MCP call + every anti-proxy / anti-native-fallback guarantee is
# the SAME code the staging backend runs.
log "4/5 driving the LITERAL list_peers MCP call per online runtime..."
echo ""
for rt in $ONLINE_RUNTIMES; do
wid="$(_map_get WS_IDS_MAP "$rt")"
WTOK=$(e2e_mint_test_token "$wid" 2>/dev/null || true)
if [ -z "$WTOK" ]; then
echo "--- $rt (ws=$wid) ---"
echo "$rt: could not mint a local MCP bearer (admin/test-token) — cannot drive the literal call"
_map_set VERDICT_MAP "$rt" "FAIL(no-bearer)"
REGRESSED=1
echo ""
continue
fi
PV_VERDICT=""
pv_assert_runtime "$rt" "$wid" "$WTOK" "$BASE" "" "$ALL_WS_IDS" || REGRESSED=1
_map_set VERDICT_MAP "$rt" "$PV_VERDICT"
echo ""
done
# ─── 5. Summary + honest gate exit ─────────────────────────────────────
echo "=== SUMMARY — LOCAL fresh-provision peer-visibility (literal MCP list_peers) ==="
for rt in $ACTIVE_RUNTIMES; do
_v="$(_map_get VERDICT_MAP "$rt")"
printf ' %-14s %s\n' "$rt" "${_v:-NO_RUN}"
done
echo ""
if [ "$REGRESSED" -ne 0 ]; then
echo "✗ GATE FAILED (LOCAL) — at least one runtime cannot see its peers via"
echo " the literal mcp_molecule_list_peers call on the local prod-mimic"
echo " stack. This is the SAME user-facing failure the proxy signals were"
echo " hiding, reproduced locally (far faster than EC2). Expected RED until"
echo " the Hermes-401 (#162) + OpenClaw-never-online/MCP-wiring (#165)"
echo " root-cause fixes land; goes green only when they actually do."
exit 10
fi
ok "GATE PASSED (LOCAL) — every runtime under test sees its platform peers via the literal MCP call."
exit 0
+14 -89
View File
@@ -64,6 +64,13 @@
set -uo pipefail
# The literal MCP list_peers assertion lives in the shared, backend-
# agnostic lib so it is BYTE-IDENTICAL between this staging backend and
# the local docker-compose backend (tests/e2e/test_peer_visibility_mcp_
# local.sh). Only provisioning/teardown differs per backend.
# shellcheck source=tests/e2e/lib/peer_visibility_assert.sh
source "$(dirname "${BASH_SOURCE[0]}")/lib/peer_visibility_assert.sh"
CP_URL="${MOLECULE_CP_URL:-https://staging-api.moleculesai.app}"
ADMIN_TOKEN="${MOLECULE_ADMIN_TOKEN:?MOLECULE_ADMIN_TOKEN required — Railway staging CP_ADMIN_API_TOKEN}"
RUN_ID_SUFFIX="${E2E_RUN_ID:-$(date +%H%M%S)-$$}"
@@ -259,101 +266,19 @@ done
# through WorkspaceAuth + MCPRateLimiter.
log "6/6 driving the LITERAL list_peers MCP call per runtime..."
echo ""
RPC_BODY='{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{"name":"list_peers","arguments":{}}}'
REGRESSED=0
declare -A VERDICT
for rt in $PV_RUNTIMES; do
wid="${WS_IDS[$rt]}"
wtok="${WS_TOKENS[$rt]}"
# The expected peer set = every OTHER provisioned workspace (parent +
# the sibling runtimes), excluding the caller itself.
EXPECT_IDS=$(echo "$ALL_WS_IDS" | tr ' ' '\n' | grep -v "^${wid}$" | grep -v '^$')
set +e
RESP=$(curl -sS -X POST "$TENANT_URL/workspaces/$wid/mcp" \
-H "Authorization: Bearer $wtok" \
-H "X-Molecule-Org-Id: $ORG_ID" \
-H "Content-Type: application/json" \
-d "$RPC_BODY" \
-o /tmp/pv_mcp_body.json -w "%{http_code}" 2>/dev/null)
set -e
HTTP_CODE="$RESP"
BODY=$(cat /tmp/pv_mcp_body.json 2>/dev/null || echo '')
echo "--- $rt (ws=$wid) ---"
echo " HTTP $HTTP_CODE"
echo " body: $(echo "$BODY" | head -c 600)"
# (1) HTTP 200 — a 401 (WorkspaceAuth reject, the Hermes symptom) fails here.
if [ "$HTTP_CODE" != "200" ]; then
echo "$rt: list_peers MCP call returned HTTP $HTTP_CODE (expected 200)"
VERDICT[$rt]="FAIL(http=$HTTP_CODE)"
REGRESSED=1
continue
fi
# (2) JSON-RPC result present, not an error object.
PARSE=$(echo "$BODY" | python3 -c "
import sys, json
expect = set(filter(None, '''$EXPECT_IDS'''.split()))
try:
d = json.load(sys.stdin)
except Exception as e:
print('PARSE_ERROR:' + str(e)); sys.exit(0)
if isinstance(d, dict) and d.get('error') is not None:
print('RPC_ERROR:' + json.dumps(d['error'])[:200]); sys.exit(0)
res = d.get('result') if isinstance(d, dict) else None
if res is None:
print('NO_RESULT'); sys.exit(0)
# MCP tools/call result shape: {content:[{type:text,text:'<json or prose>'}]}
text = ''
if isinstance(res, dict):
for c in res.get('content', []):
if c.get('type') == 'text':
text += c.get('text', '')
text_l = text.lower()
# Native-sessions fallback signature (the OpenClaw symptom): the agent
# answered from its own runtime session list, not the platform peer set.
if 'sessions_list' in text_l or 'no platform peers' in text_l or 'native session' in text_l:
print('NATIVE_FALLBACK:' + text[:200]); sys.exit(0)
# The expected sibling IDs must literally appear in the returned peer text.
found = sorted(i for i in expect if i in text)
missing = sorted(expect - set(found))
if not expect:
print('NO_EXPECTED_PEERS_CONFIGURED'); sys.exit(0)
if missing:
print('MISSING_PEERS:found=%d/%d missing=%s' % (len(found), len(expect), ','.join(m[:8] for m in missing)))
sys.exit(0)
print('OK:found=%d/%d' % (len(found), len(expect)))
" 2>/dev/null)
case "$PARSE" in
OK:*)
echo "$rt: list_peers returned 200 and contains all expected peers ($PARSE)"
VERDICT[$rt]="OK"
;;
NATIVE_FALLBACK:*)
echo "$rt: list_peers fell back to NATIVE sessions — sees no platform peers ($PARSE)"
VERDICT[$rt]="FAIL(native-fallback)"
REGRESSED=1
;;
RPC_ERROR:*|NO_RESULT|PARSE_ERROR:*)
echo "$rt: list_peers MCP call did not return a usable result ($PARSE)"
VERDICT[$rt]="FAIL(rpc=$PARSE)"
REGRESSED=1
;;
MISSING_PEERS:*)
echo "$rt: list_peers returned 200 but peer set is wrong/empty ($PARSE)"
VERDICT[$rt]="FAIL(peers=$PARSE)"
REGRESSED=1
;;
*)
echo "$rt: unexpected verdict '$PARSE'"
VERDICT[$rt]="FAIL(unknown)"
REGRESSED=1
;;
esac
# Byte-identical assertion via the shared lib. Staging passes ORG_ID as
# the X-Molecule-Org-Id header value; the literal MCP call + every
# anti-proxy / anti-native-fallback guarantee is the SAME code the
# local backend runs.
PV_VERDICT=""
pv_assert_runtime "$rt" "$wid" "$wtok" "$TENANT_URL" "$ORG_ID" "$ALL_WS_IDS" || REGRESSED=1
VERDICT[$rt]="$PV_VERDICT"
echo ""
done
@@ -168,6 +168,21 @@ func (h *WorkspaceHandler) maybeMarkContainerDead(ctx context.Context, workspace
if !h.HasProvisioner() {
return false
}
// Restart-aware short-circuit: during the 20-30s EC2-pending window of
// an in-flight restart, the workspace's url='' and IsRunning() returns
// false → looks indistinguishable from a dead container. Pre-fix this
// fired a fresh RestartByID for the just-launched instance, which
// coalesceRestart's pending-flag drained by running ANOTHER full
// stop+provision cycle (= ec2_stopped of the still-pending instance
// → re-provision). That's the 4x reprov thrash class. Skip the
// container-dead path while a restart is in flight; the in-flight
// restart's own provisionWorkspaceAutoSync will surface a real failure
// (markProvisionFailed) if the new container never comes up. Issue
// internal#544.
if isRestarting(workspaceID) {
log.Printf("ProxyA2A: maybeMarkContainerDead skipped for %s — restart already in flight (self-fire guard)", workspaceID)
return false
}
var running bool
var inspectErr error
@@ -223,6 +238,18 @@ func (h *WorkspaceHandler) maybeMarkContainerDead(ctx context.Context, workspace
// shape post-EC2-replace (see molecule-controlplane#20 incident
// 2026-05-07) where the reconciler hasn't respawned the agent yet.
func (h *WorkspaceHandler) preflightContainerHealth(ctx context.Context, workspaceID string) *proxyA2AError {
// Restart-aware short-circuit (mirror of maybeMarkContainerDead): if a
// restart cycle is in flight for this workspace, do not run the
// IsRunning probe — it would observe the EC2-pending state as "not
// running" and trigger RestartByID for an already-restarting workspace,
// closing the self-fire loop. Returning nil lets the optimistic
// forward proceed; the upstream Do() call will fail with a connection
// error or 502, and the *post-restart* reactive path can decide what
// to do once the cycle has actually completed. Issue internal#544.
if isRestarting(workspaceID) {
log.Printf("ProxyA2A preflight: %s — skipped, restart already in flight (self-fire guard)", workspaceID)
return nil
}
running, err := h.provisioner.IsRunning(ctx, workspaceID)
if err != nil {
// Transient daemon error. Provisioner.IsRunning returns (true, err)
@@ -8,6 +8,7 @@ import (
"fmt"
"log"
"net/http"
"regexp"
"strconv"
"strings"
"time"
@@ -18,6 +19,46 @@ import (
"github.com/google/uuid"
)
// internal#212 — secret-safe scrubber applied to error_detail strings
// before they cross the canvas WebSocket. Defense in depth: the
// workspace runtime already runs `_sanitize_for_external` on its side
// (workspace/executor_helpers.py), but the broadcast layer is the last
// stop before the string reaches the user's browser, so we re-scrub
// here in case any caller path forgot.
//
// The scrubber is intentionally surgical — it MUST preserve the
// actionable parts (HTTP status codes, error codes like
// `oauth_org_not_allowed`, human-readable provider messages) and
// remove only what looks credential-ish. Over-redacting defeats the
// whole point of internal#212 (giving the user a reason they can act on).
// Capture (auth-key prefix) (value) so the prefix can be preserved in
// the output. The keyword anchor prevents false positives on regular
// text that happens to contain a long alphanumeric run.
var errorDetailSecretRE = regexp.MustCompile(`(?i)((?:bearer|token|api[_-]?key|sk-proj-|sk-)[ :=]*)[A-Za-z0-9_/.-]{20,}`)
// Stringly-typed JWT-shape: 3 dot-separated base64url segments, second
// and third at least 16 chars. Matches eyJ-prefixed tokens that the
// keyword-anchored rule above would miss when they appear bare.
var errorDetailJWTRE = regexp.MustCompile(`eyJ[A-Za-z0-9_-]{8,}\.[A-Za-z0-9_-]{16,}\.[A-Za-z0-9_-]{16,}`)
const errorDetailBroadcastCap = 4096
func sanitizeErrorDetailForBroadcast(s string) string {
if s == "" {
return s
}
// Cap first — a huge error body shouldn't tax every websocket
// client's buffer. 4096 matches the workspace-side _MAX_STDERR
// budget (it's actually larger here so the runtime's cap dominates).
if len(s) > errorDetailBroadcastCap {
s = s[:errorDetailBroadcastCap] + "…[truncated]"
}
s = errorDetailSecretRE.ReplaceAllString(s, "${1}[REDACTED]")
s = errorDetailJWTRE.ReplaceAllString(s, "[REDACTED]")
return s
}
type ActivityHandler struct {
broadcaster *events.Broadcaster
}
@@ -691,6 +732,16 @@ func logActivityExec(ctx context.Context, exec activityExecutor, broadcaster eve
if respStr != nil {
payload["response_body"] = json.RawMessage(respJSON)
}
// internal#212 — surface the secret-safe failure reason on the
// live broadcast so the canvas chat-tab error banner can show
// the user *why* (provider HTTP status, error code, the
// provider's own human message) instead of the opaque
// "Agent error (Exception) — see workspace logs for details."
// hardcoded fallback. Omitted when nil so the canvas's "has
// actionable reason" guard doesn't trip on empty-string keys.
if params.ErrorDetail != nil && *params.ErrorDetail != "" {
payload["error_detail"] = sanitizeErrorDetailForBroadcast(*params.ErrorDetail)
}
}
return func() {
@@ -934,6 +934,184 @@ func TestLogActivity_Broadcast_IncludesRequestAndResponseBodies(t *testing.T) {
}
}
// TestLogActivity_Broadcast_IncludesErrorDetail pins the internal#212
// UX fix: when an a2a_receive row is logged with status="error" and a
// non-empty error_detail, the live broadcast MUST carry that detail so
// the canvas chat-tab error bubble can render the actionable reason
// (e.g. the provider's own 403 message) instead of the opaque
// "Agent error (Exception) — see workspace logs for details." string.
// Without this, the canvas falls back to the hardcoded boilerplate;
// the row's error_detail is in the DB but never reaches the user
// without a manual refresh of the Activity tab.
func TestLogActivity_Broadcast_IncludesErrorDetail(t *testing.T) {
mock := setupTestDB(t)
defer mock.ExpectationsWereMet()
mock.ExpectExec("INSERT INTO activity_logs").
WillReturnResult(sqlmock.NewResult(1, 1))
cb := &recordingBroadcaster{}
srcID := "ws-source"
tgtID := "ws-target"
method := "message/send"
// Realistic actionable reason: provider HTTP status + provider's
// own message. Secret-safe (no token, no api key, just the cause).
detail := "Anthropic 403 oauth_org_not_allowed: Your organization has disabled Claude subscription access for Claude Code — use an Anthropic API key or ask your admin to enable access."
LogActivity(context.Background(), cb, ActivityParams{
WorkspaceID: "ws-source",
ActivityType: "a2a_receive",
SourceID: &srcID,
TargetID: &tgtID,
Method: &method,
Status: "error",
ErrorDetail: &detail,
})
if len(cb.calls) != 1 {
t.Fatalf("expected 1 broadcast, got %d", len(cb.calls))
}
payload := cb.calls[0].payload
got, ok := payload["error_detail"].(string)
if !ok {
t.Fatalf("error_detail missing from broadcast payload: got %#v", payload["error_detail"])
}
if got != detail {
t.Errorf("error_detail = %q, want %q", got, detail)
}
}
// TestLogActivity_Broadcast_OmitsErrorDetailWhenNil pins the inverse:
// rows logged without an error_detail (the common ok-path) must not
// have an empty "error_detail":"" key in the broadcast, which would
// false-positive the canvas's "has actionable reason" guard and render
// an empty Underlying-Error block. The omission rule matches how
// request_body/response_body are handled.
func TestLogActivity_Broadcast_OmitsErrorDetailWhenNil(t *testing.T) {
mock := setupTestDB(t)
defer mock.ExpectationsWereMet()
mock.ExpectExec("INSERT INTO activity_logs").
WillReturnResult(sqlmock.NewResult(1, 1))
cb := &recordingBroadcaster{}
srcID := "ws-source"
LogActivity(context.Background(), cb, ActivityParams{
WorkspaceID: "ws-source",
ActivityType: "a2a_send",
SourceID: &srcID,
Status: "ok",
ErrorDetail: nil,
})
if len(cb.calls) != 1 {
t.Fatalf("expected 1 broadcast, got %d", len(cb.calls))
}
if _, present := cb.calls[0].payload["error_detail"]; present {
t.Errorf("error_detail should be omitted when nil, got %v", cb.calls[0].payload["error_detail"])
}
}
// TestSanitizeErrorDetail_StripsSecretShapes pins the secret-safe
// scrubber's contract: the broadcast layer is the last defense before
// a string crosses the canvas WebSocket and lands in the user's
// browser, so anything that *looks* like an API key / bearer token /
// JWT must be replaced with [REDACTED] even if upstream (the runtime,
// the provider) failed to scrub it. The non-secret parts of the
// message — provider status, error code, human-readable cause — MUST
// survive intact, otherwise the whole point of internal#212 (giving
// the user an actionable reason) is defeated.
func TestSanitizeErrorDetail_StripsSecretShapes(t *testing.T) {
cases := []struct {
name string
in string
mustHave []string // substrings that must survive — the actionable parts
mustMiss []string // substrings that must NOT survive — the secret shapes
}{
{
name: "preserves actionable provider reason",
in: "Anthropic 403 oauth_org_not_allowed: Your organization has disabled Claude subscription access for Claude Code",
mustHave: []string{"403", "oauth_org_not_allowed", "disabled Claude subscription"},
mustMiss: []string{"[REDACTED]"},
},
{
name: "redacts sk- API key embedded in error",
in: "openai 401 invalid_api_key: Incorrect API key provided: sk-proj-abcdefghijklmnop1234567890abcdef. Check your key.",
mustHave: []string{"401", "invalid_api_key", "Incorrect API key provided"},
mustMiss: []string{"sk-proj-abcdefghijklmnop1234567890abcdef"},
},
{
name: "redacts Bearer token in stringified header dump",
in: "auth failed; headers: Authorization: Bearer eyJhbGciOiJIUzI1NiJ9.aaaaaaaaaaaaaaaaaaaa.bbbbbbbbbbbbbbbbbbbb",
mustHave: []string{"auth failed"},
mustMiss: []string{"eyJhbGciOiJIUzI1NiJ9.aaaaaaaaaaaaaaaaaaaa.bbbbbbbbbbbbbbbbbbbb"},
},
{
name: "truncates absurdly long detail to bound payload size",
in: "kimi 500 internal_error: " + strings.Repeat("x", 8000),
mustHave: []string{"kimi 500 internal_error"},
mustMiss: []string{strings.Repeat("x", 5000)}, // 5000 in a row must NOT survive — cap is 4096
},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
got := sanitizeErrorDetailForBroadcast(tc.in)
for _, s := range tc.mustHave {
if !strings.Contains(got, s) {
t.Errorf("expected %q to survive scrub, got: %q", s, got)
}
}
for _, s := range tc.mustMiss {
if strings.Contains(got, s) {
t.Errorf("expected %q to be scrubbed, got: %q", s, got)
}
}
})
}
}
// TestLogActivity_Broadcast_ErrorDetailIsSanitized pins the integration
// of the scrubber into the broadcast path: if an upstream caller
// somehow passes through an error_detail with a secret-shaped token,
// the wire payload (what reaches the canvas WebSocket) must already
// be scrubbed. Defense in depth — the runtime should never let this
// happen, but the canvas is the trust boundary, not the runtime.
func TestLogActivity_Broadcast_ErrorDetailIsSanitized(t *testing.T) {
mock := setupTestDB(t)
defer mock.ExpectationsWereMet()
mock.ExpectExec("INSERT INTO activity_logs").
WillReturnResult(sqlmock.NewResult(1, 1))
cb := &recordingBroadcaster{}
srcID := "ws-source"
// Upstream leaked a token into the detail string. The DB still
// stores the unscrubbed copy (workspace logs are an internal
// audit surface), but the broadcast that reaches the canvas
// must already be sanitized.
detail := "anthropic 401 invalid_api_key: provided key sk-proj-leakedsecretvalueabcdefghij is wrong"
LogActivity(context.Background(), cb, ActivityParams{
WorkspaceID: "ws-source",
ActivityType: "a2a_receive",
SourceID: &srcID,
Status: "error",
ErrorDetail: &detail,
})
if len(cb.calls) != 1 {
t.Fatalf("expected 1 broadcast, got %d", len(cb.calls))
}
got, _ := cb.calls[0].payload["error_detail"].(string)
if strings.Contains(got, "sk-proj-leakedsecretvalueabcdefghij") {
t.Errorf("broadcast leaked secret-shaped token: %q", got)
}
if !strings.Contains(got, "invalid_api_key") {
t.Errorf("scrubber over-redacted: lost the actionable code from %q", got)
}
}
// TestLogActivityTx_DefersBroadcastUntilCommitHook pins the #149
// contract: LogActivityTx returns a commitHook that the caller MUST
// invoke after tx.Commit(); the broadcast MUST NOT fire from inside
@@ -1,6 +1,9 @@
package handlers
import (
"log"
"os"
"path/filepath"
"regexp"
"strings"
)
@@ -91,6 +94,97 @@ func applyGitAskpass(envVars map[string]string) {
setIfEmpty(envVars, "GIT_ASKPASS", gitAskpassHelperPath)
}
// applyAgentGitHTTPCreds reads the persona's HTTPS git credential from
// the operator-host bootstrap dir and injects it as GIT_HTTP_USERNAME /
// GIT_HTTP_PASSWORD so the in-container askpass helper can emit it on
// git's auth challenge.
//
// Why a dedicated env-var pair instead of reusing GITEA_USER / GITEA_TOKEN:
// the provisioner's forensic #145 denylist (provisioner.scmWriteTokenKeys)
// strips any env var named GITEA_TOKEN / GITHUB_TOKEN / GH_TOKEN /
// GITLAB_TOKEN / GL_TOKEN / BITBUCKET_TOKEN from tenant container env
// before docker run. That denylist is by exact key name, so the same
// token survives transport when shipped under the generic
// GIT_HTTP_USERNAME / GIT_HTTP_PASSWORD names that the askpass helper
// reads first (scripts/git-askpass.sh in each template-*). The username
// half stays an identifier (the persona's Gitea login), the password
// half carries the bytes from the persona token file.
//
// The fallback pair GITEA_USER / GITEA_TOKEN is ALSO set — GITEA_USER
// survives the denylist (it's an identity, not a credential) and
// GITEA_TOKEN is the no-op write that buildContainerEnv will drop.
// Both pairs in lockstep means the askpass helper's GIT_HTTP_*-first /
// GITEA_*-fallback chain works regardless of which lane lands first in
// the container env on any future provisioner refactor.
//
// Idempotent: existing GIT_HTTP_USERNAME / GIT_HTTP_PASSWORD keys are
// preserved. Operator-supplied workspace_secrets win over the persona
// token file by virtue of running BEFORE this helper in
// prepareProvisionContext.
//
// Silent no-op when:
// - personaKey is empty (no role → no persona dir to consult)
// - personaKey fails the safe-segment check (defense-in-depth against
// a crafted role escaping the persona dir)
// - the persona token file does not exist or is empty (legitimate
// case for personas that don't ship a git-write credential — e.g.
// read-only PM/Reviewer/Researcher identities or a partially-
// provisioned bootstrap)
//
// No vendor-specific behaviour: this function reads bytes from a path
// and emits them as the standard askpass env-var pair. The host the
// credential applies to is determined by the deployer choosing which
// remote to push to — the askpass helper has no hardcoded hostnames.
func applyAgentGitHTTPCreds(envVars map[string]string, personaKey string) {
if envVars == nil {
return
}
personaKey = strings.TrimSpace(personaKey)
if !isSafeRoleName(personaKey) {
// Silent no-op for empty / unsafe keys — same shape as
// loadPersonaTokenFile. Descriptive-role payloads (multi-word
// "Frontend Engineer" etc.) take this branch and pick up
// creds via workspace_secrets / org-import persona-env merge,
// not the direct persona-token file path.
return
}
root := os.Getenv("MOLECULE_PERSONA_ROOT")
if root == "" {
root = "/etc/molecule-bootstrap/personas"
}
tokenPath := filepath.Join(root, personaKey, "token")
data, err := os.ReadFile(tokenPath)
if err != nil {
// Persona dir / file absent: legitimate for the host shapes
// that don't ship the bootstrap kit (dev laptops, CI nodes)
// or for personas that intentionally carry no git-write
// credential. Caller decides whether the resulting
// "Authentication failed" at first push is a configuration
// error or expected behaviour.
return
}
token := strings.TrimSpace(string(data))
if token == "" {
return
}
// Primary lane — survives forensic #145 by virtue of the generic
// GIT_HTTP_* names not being on the SCM-write denylist.
setIfEmpty(envVars, "GIT_HTTP_USERNAME", personaKey)
setIfEmpty(envVars, "GIT_HTTP_PASSWORD", token)
// Fallback lane — askpass reads GITEA_USER / GITEA_TOKEN second.
// GITEA_USER survives the denylist; GITEA_TOKEN will be stripped
// by buildContainerEnv but is set here for completeness so the
// (envVars map[string]string) contract is consistent for callers
// inspecting it before the provisioner-level filter runs (e.g.
// the env-mutator plugin chain).
setIfEmpty(envVars, "GITEA_USER", personaKey)
setIfEmpty(envVars, "GITEA_TOKEN", token)
log.Printf("applyAgentGitHTTPCreds: injected GIT_HTTP_USERNAME/PASSWORD for persona %q (token %d bytes)", personaKey, len(token))
}
// slugifyForEmail collapses a workspace name to a safe email localpart:
// lowercase, non-alphanumeric runs → single hyphen, stripped at edges.
// "Frontend Engineer" → "frontend-engineer".
@@ -1,6 +1,8 @@
package handlers
import (
"os"
"path/filepath"
"testing"
)
@@ -122,6 +124,214 @@ func TestApplyGitAskpass_NilMapIsSafe(t *testing.T) {
applyGitAskpass(nil)
}
// TestApplyAgentGitHTTPCreds_HappyPath: the prod-team shape — a persona
// dir at /etc/molecule-bootstrap/personas/<role>/token ships a write
// token. applyAgentGitHTTPCreds reads it and emits both the
// askpass-preferred GIT_HTTP_* pair and the GITEA_* fallback.
func TestApplyAgentGitHTTPCreds_HappyPath(t *testing.T) {
root := t.TempDir()
roleDir := filepath.Join(root, "agent-dev-a")
if err := os.MkdirAll(roleDir, 0o755); err != nil {
t.Fatal(err)
}
if err := os.WriteFile(filepath.Join(roleDir, "token"),
[]byte("token-bytes-redacted\n"), 0o600); err != nil {
t.Fatal(err)
}
t.Setenv("MOLECULE_PERSONA_ROOT", root)
env := map[string]string{}
applyAgentGitHTTPCreds(env, "agent-dev-a")
cases := map[string]string{
"GIT_HTTP_USERNAME": "agent-dev-a",
"GIT_HTTP_PASSWORD": "token-bytes-redacted",
"GITEA_USER": "agent-dev-a",
"GITEA_TOKEN": "token-bytes-redacted",
}
for k, want := range cases {
if got := env[k]; got != want {
t.Errorf("%s: got %q, want %q", k, got, want)
}
}
}
// TestApplyAgentGitHTTPCreds_TrimsWhitespace: bootstrap-kit-written
// token files canonically end in \n. Must trim like loadPersonaTokenFile
// does — Gitea PAT validator rejects embedded whitespace.
func TestApplyAgentGitHTTPCreds_TrimsWhitespace(t *testing.T) {
root := t.TempDir()
roleDir := filepath.Join(root, "agent-dev-b")
if err := os.MkdirAll(roleDir, 0o755); err != nil {
t.Fatal(err)
}
if err := os.WriteFile(filepath.Join(roleDir, "token"),
[]byte("\n raw-token-bytes \n\n"), 0o600); err != nil {
t.Fatal(err)
}
t.Setenv("MOLECULE_PERSONA_ROOT", root)
env := map[string]string{}
applyAgentGitHTTPCreds(env, "agent-dev-b")
if env["GIT_HTTP_PASSWORD"] != "raw-token-bytes" {
t.Errorf("GIT_HTTP_PASSWORD: token whitespace not trimmed; got %q", env["GIT_HTTP_PASSWORD"])
}
}
// TestApplyAgentGitHTTPCreds_RespectsOperatorOverride: if a workspace
// secret (loaded earlier by loadWorkspaceSecrets) already set the
// askpass pair, those values must win — operator intent ranks above
// persona-file defaults. Symmetric with applyAgentGitIdentity's
// GIT_AUTHOR_* override semantics.
func TestApplyAgentGitHTTPCreds_RespectsOperatorOverride(t *testing.T) {
root := t.TempDir()
roleDir := filepath.Join(root, "agent-dev-a")
if err := os.MkdirAll(roleDir, 0o755); err != nil {
t.Fatal(err)
}
if err := os.WriteFile(filepath.Join(roleDir, "token"),
[]byte("file-token\n"), 0o600); err != nil {
t.Fatal(err)
}
t.Setenv("MOLECULE_PERSONA_ROOT", root)
env := map[string]string{
"GIT_HTTP_USERNAME": "operator-user",
"GIT_HTTP_PASSWORD": "operator-secret",
}
applyAgentGitHTTPCreds(env, "agent-dev-a")
if env["GIT_HTTP_USERNAME"] != "operator-user" {
t.Errorf("GIT_HTTP_USERNAME should not be overwritten, got %q", env["GIT_HTTP_USERNAME"])
}
if env["GIT_HTTP_PASSWORD"] != "operator-secret" {
t.Errorf("GIT_HTTP_PASSWORD should not be overwritten, got %q", env["GIT_HTTP_PASSWORD"])
}
// Fallback pair was not pre-set, so persona-file fills it in.
if env["GITEA_TOKEN"] != "file-token" {
t.Errorf("GITEA_TOKEN fallback should be filled, got %q", env["GITEA_TOKEN"])
}
}
// TestApplyAgentGitHTTPCreds_EmptyKeyIsNoop: a workspace with an empty
// payload.Role (descriptive multi-word role, or no role) must take the
// silent-no-op branch — no FS read, no env keys touched.
func TestApplyAgentGitHTTPCreds_EmptyKeyIsNoop(t *testing.T) {
root := t.TempDir()
t.Setenv("MOLECULE_PERSONA_ROOT", root)
env := map[string]string{}
applyAgentGitHTTPCreds(env, "")
if len(env) != 0 {
t.Errorf("empty persona key should leave env untouched, got %v", env)
}
applyAgentGitHTTPCreds(env, " ")
if len(env) != 0 {
t.Errorf("whitespace persona key should leave env untouched, got %v", env)
}
applyAgentGitHTTPCreds(env, "Frontend Engineer")
if len(env) != 0 {
t.Errorf("multi-word descriptive role should leave env untouched (silent no-op via isSafeRoleName), got %v", env)
}
}
// TestApplyAgentGitHTTPCreds_MissingTokenFile: persona dir exists but
// ships no token (legitimate for read-only personas like agent-pm pre-
// CTO-cred or partially-provisioned bootstrap). Silent no-op — no env
// keys set so first push surfaces "Authentication failed" cleanly
// instead of half-configured creds.
func TestApplyAgentGitHTTPCreds_MissingTokenFile(t *testing.T) {
root := t.TempDir()
if err := os.MkdirAll(filepath.Join(root, "agent-pm"), 0o755); err != nil {
t.Fatal(err)
}
t.Setenv("MOLECULE_PERSONA_ROOT", root)
env := map[string]string{}
applyAgentGitHTTPCreds(env, "agent-pm")
if len(env) != 0 {
t.Errorf("missing token file should leave env untouched, got %v", env)
}
}
// TestApplyAgentGitHTTPCreds_EmptyTokenIsNoop: a whitespace-only token
// file (botched bootstrap) must be treated as absent — never emit
// GIT_HTTP_PASSWORD="" because the askpass helper would then return
// empty on the password prompt and git would surface a confusing 401
// rather than a clean "no credentials" state.
func TestApplyAgentGitHTTPCreds_EmptyTokenIsNoop(t *testing.T) {
root := t.TempDir()
roleDir := filepath.Join(root, "agent-dev-a")
if err := os.MkdirAll(roleDir, 0o755); err != nil {
t.Fatal(err)
}
if err := os.WriteFile(filepath.Join(roleDir, "token"),
[]byte(" \t\n \n"), 0o600); err != nil {
t.Fatal(err)
}
t.Setenv("MOLECULE_PERSONA_ROOT", root)
env := map[string]string{}
applyAgentGitHTTPCreds(env, "agent-dev-a")
if len(env) != 0 {
t.Errorf("whitespace-only token should leave env untouched, got %v", env)
}
}
// TestApplyAgentGitHTTPCreds_RejectsUnsafeRole: defense-in-depth — a
// crafted role with path separators / "../" must NOT touch the FS,
// even if a token file exists at the traversed location.
func TestApplyAgentGitHTTPCreds_RejectsUnsafeRole(t *testing.T) {
root := t.TempDir()
// Plant a token at <root>/token so a successful traversal would land here.
if err := os.WriteFile(filepath.Join(root, "token"),
[]byte("stolen-token\n"), 0o600); err != nil {
t.Fatal(err)
}
t.Setenv("MOLECULE_PERSONA_ROOT", filepath.Join(root, "personas"))
for _, bad := range []string{"..", "../personas", "/abs", "with/slash", "."} {
env := map[string]string{}
applyAgentGitHTTPCreds(env, bad)
if len(env) != 0 {
t.Errorf("unsafe role %q must leave env untouched, got %v", bad, env)
}
}
}
// TestApplyAgentGitHTTPCreds_NilMapIsSafe: defensive — never panic
// on a nil map. Symmetric with applyAgentGitIdentity's nil-map test.
func TestApplyAgentGitHTTPCreds_NilMapIsSafe(t *testing.T) {
defer func() {
if r := recover(); r != nil {
t.Errorf("applyAgentGitHTTPCreds panicked on nil map: %v", r)
}
}()
applyAgentGitHTTPCreds(nil, "agent-dev-a")
}
// TestApplyAgentGitHTTPCreds_DefaultPersonaRoot: when
// MOLECULE_PERSONA_ROOT is unset, the helper falls back to
// /etc/molecule-bootstrap/personas — the canonical operator-host path
// per the bootstrap kit shape. We can't write into /etc in a test,
// but we CAN assert the helper takes the silent-no-op branch when
// that real path is absent (the prod-default case on a dev laptop).
func TestApplyAgentGitHTTPCreds_DefaultPersonaRoot(t *testing.T) {
t.Setenv("MOLECULE_PERSONA_ROOT", "")
env := map[string]string{}
applyAgentGitHTTPCreds(env, "agent-dev-a")
// The /etc/molecule-bootstrap/personas/agent-dev-a/token path
// almost certainly does not exist on a dev/CI host. The contract
// here is "silent no-op when token unreadable", not "exact env
// state" — so we only assert no panic + no half-state pair.
if _, ok := env["GIT_HTTP_USERNAME"]; ok {
if _, ok2 := env["GIT_HTTP_PASSWORD"]; !ok2 {
t.Errorf("USERNAME set without PASSWORD — half-state; got %v", env)
}
}
}
func TestSlugifyForEmail(t *testing.T) {
cases := []struct {
in, want string
@@ -180,6 +180,42 @@ func waitForWorkspaceOnline(ctx context.Context, workspaceID string, timeout tim
return false
}
// waitForFreshHeartbeat polls until the workspace has BOTH a non-empty
// url AND a last_heartbeat_at strictly after restartStartTs (i.e. the
// heartbeat we observe is NEW, not the stale pre-restart one carried
// across through the row update). Returns false on timeout or DB error.
//
// This is the Layer 2 gate for the 2026-05-19 ws-server self-fire restart
// loop fix. status='online' can flip while url='' is still in place (the
// status update happens in /registry/register; url is set at the same
// time but the read here may see a transient interleaving) and pre-fix
// the trailing restart-context probe could fire against a half-registered
// row, triggering the upstream-502 → maybeMarkContainerDead → self-fire
// chain we're closing. The url + heartbeat-freshness check is the
// strict, correlated end-state assertion that says "the new container is
// actually addressable" — not just "some heartbeat happened".
func waitForFreshHeartbeat(ctx context.Context, workspaceID string, restartStartTs time.Time, timeout time.Duration) bool {
deadline := time.Now().Add(timeout)
for time.Now().Before(deadline) {
var url sql.NullString
var lastHB sql.NullTime
err := db.DB.QueryRowContext(ctx,
`SELECT url, last_heartbeat_at FROM workspaces WHERE id = $1`, workspaceID,
).Scan(&url, &lastHB)
if err == nil &&
url.Valid && url.String != "" &&
lastHB.Valid && lastHB.Time.After(restartStartTs) {
return true
}
select {
case <-ctx.Done():
return false
case <-time.After(restartContextOnlinePollInterval):
}
}
return false
}
// buildRestartA2APayload wraps the rendered context string in the
// JSON-RPC 2.0 / A2A message/send shape that the proxy already knows
// how to normalize. Returns the marshalled body ready for ProxyA2ARequest.
@@ -220,6 +256,22 @@ func (h *WorkspaceHandler) sendRestartContext(workspaceID string, data restartCo
log.Printf("restart-context: workspace %s did not come online within %s — dropping context message", workspaceID, restartContextOnlineTimeout)
return
}
// Self-fire guard (Layer 2 of the 2026-05-19 ws-server self-fire fix):
// status='online' alone is not enough to safely fire the trailing
// ProxyA2ARequest. The workspace must also have:
// - url != '' (the new container's URL has been registered)
// - last_heartbeat_at > data.RestartAt (the heartbeat we're seeing is NEW, not stale)
// Without those, ProxyA2ARequest can fail with a connect error or
// upstream 502, hit handleA2ADispatchError → maybeMarkContainerDead →
// RestartByID → self-fire. The Layer 1 isRestarting gate already
// covers that, but this is a belt-and-suspenders so the probe never
// even tries until the new container is actually addressable. Best-
// effort: if the DB read errors out we proceed (preserves the legacy
// behaviour of "online means online").
if !waitForFreshHeartbeat(ctx, workspaceID, data.RestartAt, restartContextOnlineTimeout) {
log.Printf("restart-context: workspace %s online but no fresh heartbeat or empty url — dropping context message (self-fire guard)", workspaceID)
return
}
text := buildRestartContextMessage(data)
body, err := buildRestartA2APayload(text)
@@ -0,0 +1,176 @@
package handlers
// workspace_provision_forbidden_env.go — Layer 1 of the RFC#523
// tenant-workspace forbidden-env guardrail (task #146).
//
// Threat model: tenant workspaces (per-customer EC2 / container)
// run untrusted agent-controlled code and MUST NEVER receive
// operator-fleet-scope credentials. A leak from one tenant
// workspace to operator scope would escalate "compromise of one
// agent" into "compromise of the whole platform."
//
// The existing forensic #145 guard (provisioner.scmWriteTokenKeys
// in buildContainerEnv / CPProvisioner.Start) strips SCM-write
// tokens at the FINAL container-env-build step — silent strip,
// no signal back to the caller. RFC#523 adds a FAIL-CLOSED layer
// EARLIER in the provision pipeline: when the resolved env-set
// at prepareProvisionContext-time contains any forbidden var
// name, the provision is aborted with a structured error so the
// operator sees the leak immediately instead of running with a
// silently-stripped env.
//
// Layer placement (3-layer defense-in-depth, RFC#523 §"Proposed guardrail"):
// - L1 (this file): provisioner-side abort BEFORE container start
// - L2 (workspace/entrypoint.sh + template-* start.sh): in-container
// env-grep + exit 1 — defense-in-depth if L1 is bypassed
// - L3 (.gitea/workflows/lint-forbidden-env-keys.yml): CI lint that
// scans Go code under workspace-server/ for new writers that
// would inject a forbidden key
//
// Open-source-template compatibility (memory
// `feedback_open_source_templates_no_hardcoded_org_internals`):
// the forbidden-key set is GENERIC (no molecule-AI-specific
// hostnames or org names). A third-party fork can replace this
// set with its own operator-scope key names without editing any
// template.
import (
"fmt"
"sort"
"strings"
)
// forbiddenTenantEnvKeys is the set of environment variable names
// that MUST NOT reach a tenant workspace container. The check is
// by exact key name — value-shape leaks (40-byte hex strings, etc)
// are out of scope here; the separate secret-scan workflow covers
// that class.
//
// Categories (RFC#523):
// - SCM-write tokens: same as provisioner.scmWriteTokenKeys, kept
// in sync. Listed again here so a future split of the two
// denylists is auditable diff.
// - Control-plane admin tokens: any token that grants control-plane
// admin API access.
// - Secret-store operator tokens: bootstrap-scope tokens for the
// central secret store.
// - Infra-platform tokens: deploy / fleet-management creds.
// - Operator-host pointers: hostnames / addresses that identify
// the operator host. Per the open-source-template rule these
// are MOLECULE_OPERATOR_HOST style prefixes; the literal
// prefix is matched but the test for membership reads from
// this map, not from a hardcoded constant in the deny rule
// itself.
//
// Per-agent persona PATs (e.g. AGENT_DEV_A_TOKEN style names —
// not operator-fleet scope) are NOT on this list. The guard
// checks the env VAR NAME, not the token VALUE, so a per-agent
// scoped token under a per-agent var name passes through.
var forbiddenTenantEnvKeys = map[string]struct{}{
// SCM-write — kept in sync with provisioner.scmWriteTokenKeys.
"GITEA_TOKEN": {},
"GITEA_PAT": {},
"GITHUB_TOKEN": {},
"GITHUB_PAT": {},
"GH_TOKEN": {},
"GITLAB_TOKEN": {},
"GL_TOKEN": {},
"BITBUCKET_TOKEN": {},
// Control-plane admin tokens.
"CP_ADMIN_API_TOKEN": {},
"CP_ADMIN_TOKEN": {},
// Secret-store operator tokens (Infisical SSOT — operator scope only).
"INFISICAL_OPERATOR_TOKEN": {},
"INFISICAL_BOOTSTRAP_TOKEN": {},
// Infra-platform tokens.
"RAILWAY_TOKEN": {},
"RAILWAY_PERSONAL_API_TOKEN": {},
"HETZNER_TOKEN": {},
"HETZNER_API_TOKEN": {},
}
// forbiddenTenantEnvPrefixes are key-name PREFIXES that match
// operator-scope env vars. Matched in addition to the exact-key
// set above. Useful for "MOLECULE_OPERATOR_*" style families
// where new members get added without re-editing the deny set.
//
// Kept as a tiny set on purpose — over-broad prefix matching is
// the failure mode this layer's exact-key set is designed to
// avoid. Add a prefix here only when the family is closed
// (every member is operator-scope; no legitimate tenant-scope
// member exists or will).
var forbiddenTenantEnvPrefixes = []string{
"MOLECULE_OPERATOR_",
}
// isForbiddenTenantEnvKey reports whether an env var name is on
// the forbidden-for-tenant-workspaces list (either by exact match
// in forbiddenTenantEnvKeys or by prefix in
// forbiddenTenantEnvPrefixes).
//
// Exported-style helper kept package-private — the deny set is
// internal to the workspace-server package; external callers must
// go through the provision pipeline, which means the abort path
// fires for them too.
func isForbiddenTenantEnvKey(key string) bool {
if _, ok := forbiddenTenantEnvKeys[key]; ok {
return true
}
for _, prefix := range forbiddenTenantEnvPrefixes {
if strings.HasPrefix(key, prefix) {
return true
}
}
return false
}
// findForbiddenTenantEnvKeys scans the resolved env-set and
// returns the sorted list of forbidden keys present. Empty slice
// (not nil — easier for callers to JSON-encode) when none match.
//
// Deterministic order: the result feeds the user-facing error
// message and the structured-extra payload that goes to the
// canvas Events tab. Sorting makes the message stable across
// Go's randomized map iteration.
func findForbiddenTenantEnvKeys(envVars map[string]string) []string {
if len(envVars) == 0 {
return []string{}
}
found := make([]string, 0)
for k := range envVars {
if isForbiddenTenantEnvKey(k) {
found = append(found, k)
}
}
sort.Strings(found)
return found
}
// formatForbiddenTenantEnvError builds the safe-canned user-facing
// message for a provision aborted because forbidden env keys are
// present in the resolved env-set. The message names the
// offending keys (key names are not secret — the values would be,
// but only names are surfaced) and points at the RFC.
//
// Same shape as formatMissingEnvError so the canvas Events tab
// renders both classes consistently.
func formatForbiddenTenantEnvError(keys []string) string {
if len(keys) == 0 {
// Defensive: caller should not invoke with empty input,
// but keep the function total.
return "provision aborted: forbidden operator-scope env vars present (RFC#523)"
}
if len(keys) == 1 {
return fmt.Sprintf(
"provision aborted: env var %q is operator-scope and must not reach tenant workspaces (RFC#523) — remove it from workspace_secrets / global_secrets and retry",
keys[0],
)
}
return fmt.Sprintf(
"provision aborted: env vars %s are operator-scope and must not reach tenant workspaces (RFC#523) — remove them from workspace_secrets / global_secrets and retry",
strings.Join(keys, ", "),
)
}
@@ -0,0 +1,182 @@
package handlers
// workspace_provision_forbidden_env_test.go — Layer 1 tests for the
// RFC#523 tenant-workspace forbidden-env guardrail (task #146).
//
// Behaviour pinned (per RFC#523 §"Acceptance criteria" Layer 1):
// - exact-match keys (GITEA_TOKEN, CP_ADMIN_API_TOKEN, RAILWAY_TOKEN,
// INFISICAL_OPERATOR_TOKEN, …) are flagged
// - MOLECULE_OPERATOR_* prefix family is flagged
// - per-agent-scope vars (GIT_HTTP_USERNAME, ANTHROPIC_API_KEY,
// AGENT_DEV_A_TOKEN, …) are NOT flagged — guard checks key NAME
// not value
// - findForbiddenTenantEnvKeys returns a deterministically-sorted
// slice (canvas Events tab needs stable rendering)
// - formatForbiddenTenantEnvError uses singular vs plural phrasing
// so the message reads naturally for both 1-key and N-key cases
//
// Companion: provisioner.buildContainerEnv has the older silent-
// strip guard (forensic #145). The two layers are intentionally
// redundant — this one fails closed early; that one strips late.
import (
"strings"
"testing"
)
func TestIsForbiddenTenantEnvKey_ExactMatches(t *testing.T) {
cases := []struct {
key string
want bool
}{
// SCM-write tokens — kept in sync with provisioner.scmWriteTokenKeys.
{"GITEA_TOKEN", true},
{"GITEA_PAT", true},
{"GITHUB_TOKEN", true},
{"GITHUB_PAT", true},
{"GH_TOKEN", true},
{"GITLAB_TOKEN", true},
{"GL_TOKEN", true},
{"BITBUCKET_TOKEN", true},
// Control-plane admin tokens.
{"CP_ADMIN_API_TOKEN", true},
{"CP_ADMIN_TOKEN", true},
// Secret-store operator tokens.
{"INFISICAL_OPERATOR_TOKEN", true},
{"INFISICAL_BOOTSTRAP_TOKEN", true},
// Infra-platform tokens.
{"RAILWAY_TOKEN", true},
{"RAILWAY_PERSONAL_API_TOKEN", true},
{"HETZNER_TOKEN", true},
{"HETZNER_API_TOKEN", true},
// Per-agent scoped — must NOT be flagged.
{"GIT_HTTP_USERNAME", false},
{"GIT_HTTP_PASSWORD", false},
{"ANTHROPIC_API_KEY", false},
{"ANTHROPIC_AUTH_TOKEN", false},
{"OPENAI_API_KEY", false},
{"KIMI_API_KEY", false},
{"MINIMAX_API_KEY", false},
{"AGENT_DEV_A_TOKEN", false}, // hypothetical per-agent name
{"MOLECULE_AGENT_ROLE", false},
{"PARENT_ID", false},
{"WORKSPACE_ID", false},
{"PLATFORM_URL", false},
{"", false},
}
for _, c := range cases {
got := isForbiddenTenantEnvKey(c.key)
if got != c.want {
t.Errorf("isForbiddenTenantEnvKey(%q) = %v; want %v", c.key, got, c.want)
}
}
}
func TestIsForbiddenTenantEnvKey_PrefixMatches(t *testing.T) {
cases := []struct {
key string
want bool
}{
{"MOLECULE_OPERATOR_HOST", true},
{"MOLECULE_OPERATOR_SSH_KEY", true},
{"MOLECULE_OPERATOR_BACKUP_BUCKET", true},
{"MOLECULE_OPERATOR_", true}, // prefix itself
// Adjacent but NOT in prefix family.
{"MOLECULE_AGENT_ROLE", false},
{"MOLECULE_URL", false},
{"MOLECULE_PERSONA_ROOT", false}, // path on operator host, not tenant
{"MOLECULE_GITEA_TOKEN", false}, // localbuild-time only; not a tenant env
}
for _, c := range cases {
got := isForbiddenTenantEnvKey(c.key)
if got != c.want {
t.Errorf("isForbiddenTenantEnvKey(%q) = %v; want %v", c.key, got, c.want)
}
}
}
func TestFindForbiddenTenantEnvKeys_NoneAndEmpty(t *testing.T) {
if got := findForbiddenTenantEnvKeys(nil); len(got) != 0 {
t.Errorf("nil envVars: got %v; want empty", got)
}
if got := findForbiddenTenantEnvKeys(map[string]string{}); len(got) != 0 {
t.Errorf("empty envVars: got %v; want empty", got)
}
clean := map[string]string{
"ANTHROPIC_API_KEY": "sk-keep",
"GIT_HTTP_USERNAME": "agent-dev-a",
"GIT_HTTP_PASSWORD": "scoped-pat",
"MOLECULE_AGENT_ROLE": "agent-dev-a",
"WORKSPACE_ID": "ws-123",
}
if got := findForbiddenTenantEnvKeys(clean); len(got) != 0 {
t.Errorf("clean envVars: got %v; want empty", got)
}
}
func TestFindForbiddenTenantEnvKeys_SingleAndMultipleSorted(t *testing.T) {
// Single key.
single := map[string]string{
"ANTHROPIC_API_KEY": "sk-keep",
"GITEA_TOKEN": "operator-scope-leak",
}
got := findForbiddenTenantEnvKeys(single)
if len(got) != 1 || got[0] != "GITEA_TOKEN" {
t.Errorf("single forbidden: got %v; want [GITEA_TOKEN]", got)
}
// Multiple keys — must be sorted (canvas Events tab needs stability).
multi := map[string]string{
"RAILWAY_TOKEN": "z",
"GITEA_TOKEN": "a",
"MOLECULE_OPERATOR_HOST": "m",
"CP_ADMIN_API_TOKEN": "c",
"ANTHROPIC_API_KEY": "ok",
}
got = findForbiddenTenantEnvKeys(multi)
want := []string{"CP_ADMIN_API_TOKEN", "GITEA_TOKEN", "MOLECULE_OPERATOR_HOST", "RAILWAY_TOKEN"}
if len(got) != len(want) {
t.Fatalf("multi forbidden length: got %v; want %v", got, want)
}
for i := range want {
if got[i] != want[i] {
t.Errorf("multi forbidden[%d] = %q; want %q (full got=%v want=%v)", i, got[i], want[i], got, want)
}
}
}
func TestFormatForbiddenTenantEnvError_Phrasing(t *testing.T) {
// Empty input — defensive total function.
if msg := formatForbiddenTenantEnvError(nil); !strings.Contains(msg, "RFC#523") {
t.Errorf("empty input: missing RFC#523 ref: %q", msg)
}
// Singular phrasing.
single := formatForbiddenTenantEnvError([]string{"GITEA_TOKEN"})
if !strings.Contains(single, `"GITEA_TOKEN"`) {
t.Errorf("single: missing quoted key: %q", single)
}
if !strings.Contains(single, "operator-scope") {
t.Errorf("single: missing operator-scope phrase: %q", single)
}
if !strings.Contains(single, "RFC#523") {
t.Errorf("single: missing RFC#523 ref: %q", single)
}
if strings.Contains(single, "env vars ") { // plural form
t.Errorf("single: leaked plural phrasing: %q", single)
}
// Plural phrasing.
multi := formatForbiddenTenantEnvError([]string{"CP_ADMIN_API_TOKEN", "GITEA_TOKEN"})
if !strings.Contains(multi, "CP_ADMIN_API_TOKEN, GITEA_TOKEN") {
t.Errorf("plural: missing joined list: %q", multi)
}
if !strings.Contains(multi, "env vars ") {
t.Errorf("plural: missing plural phrase: %q", multi)
}
}
@@ -125,12 +125,62 @@ func (h *WorkspaceHandler) prepareProvisionContext(
return nil, &provisionAbort{Msg: decryptErr}
}
// RFC#523 Layer 1 (task #146): refuse to start a tenant workspace
// when any forbidden operator-scope env var is present in the
// resolved secret-load env-set. Runs IMMEDIATELY after
// loadWorkspaceSecrets and BEFORE applyAgentGitHTTPCreds — the
// per-agent persona injection sets a fallback GITEA_USER/GITEA_TOKEN
// pair that the buildContainerEnv forensic #145 guard will strip
// later. We want THIS layer to catch leaks from the operator-
// controlled stores (global_secrets, workspace_secrets) only, not
// the deliberate per-agent platform injection that lives downstream.
//
// Threat model is "an upstream secret-writer accidentally widened
// the propagation set" — e.g. an operator pastes GITEA_TOKEN into
// a workspace_secrets row. Caught here, surfaced loudly to the
// canvas Events tab, fail-closed. The existing forensic #145 guard
// in provisioner.buildContainerEnv / CPProvisioner.Start stays as
// defense-in-depth: it silently strips at container-env-build time.
//
// Key names (not values) are echoed in the user-facing error so
// the operator can locate and remove the offending row. Per memory
// `feedback_passwords_in_chat_are_burned`, key names are not
// secret; values would be.
if forbidden := findForbiddenTenantEnvKeys(envVars); len(forbidden) > 0 {
msg := formatForbiddenTenantEnvError(forbidden)
log.Printf("Provisioner: ABORT workspace=%s — forbidden operator-scope env keys present: %v (RFC#523)", workspaceID, forbidden)
return nil, &provisionAbort{
Msg: msg,
Extra: map[string]interface{}{"error": msg, "forbidden_env_keys": forbidden, "rfc": "523"},
}
}
pluginsPath, _ := filepath.Abs(filepath.Join(h.configsDir, "..", "plugins"))
awarenessNamespace := h.loadAwarenessNamespace(ctx, workspaceID)
// Per-agent git identity (#1957) — must run after secret loads so
// a workspace_secret named GIT_AUTHOR_NAME can override.
applyAgentGitIdentity(envVars, payload.Name)
// Per-agent git HTTP credential injection — bridges the gap that
// PR template-claude-code#30 + mc#1525 left open: the askpass binary
// + GIT_ASKPASS env are wired in-image, but until now no code path
// in workspace-server actually read the persona's git token from
// the operator-host bootstrap dir and exported it as
// GIT_HTTP_USERNAME / GIT_HTTP_PASSWORD. Without this, the askpass
// helper invokes with an empty password env and git fails the
// auth challenge in ~500ms (live-verified for Dev-A/Dev-B
// 2026-05-18 ~23:55Z).
//
// Runs AFTER applyAgentGitIdentity so workspace_secrets named
// GIT_HTTP_USERNAME / GIT_HTTP_PASSWORD (operator-supplied,
// loaded earlier by loadWorkspaceSecrets) win over the
// persona-file default. Uses payload.Role as the persona key —
// this matches the slug-form convention agent-dev-a /
// agent-dev-b / agent-pm. Descriptive multi-word roles
// ("Frontend Engineer") take the silent-no-op branch and
// continue to rely on workspace_secrets / org-import persona-env
// merge for their git auth.
applyAgentGitHTTPCreds(envVars, payload.Role)
applyRuntimeModelEnv(envVars, payload.Runtime, payload.Model)
if payload.Role != "" {
envVars["MOLECULE_AGENT_ROLE"] = payload.Role
@@ -297,6 +297,203 @@ func TestPrepareProvisionContext_ParentIDInjection(t *testing.T) {
func ptrStr(s string) *string { return &s }
// TestPrepareProvisionContext_InjectsGitHTTPCredsFromPersonaToken pins
// the end-to-end wiring of the durable-git-auth fix: when a workspace
// is provisioned with a slug-form role matching a persona dir at
// $MOLECULE_PERSONA_ROOT/<role>/token, the prepared envVars MUST
// carry GIT_HTTP_USERNAME / GIT_HTTP_PASSWORD (+ GITEA_USER / GITEA_TOKEN
// fallback) so the in-container askpass helper has something to emit
// on git's auth challenge.
//
// Pre-fix shape (Dev-A/Dev-B live-verified 2026-05-18 ~23:55Z): the
// askpass binary + GIT_ASKPASS env were already wired
// (template-claude-code#30 + mc#1525), but GIT_HTTP_USERNAME and
// GIT_HTTP_PASSWORD were absent from the container env → askpass
// returned empty → git rc=128 "Authentication failed" in <500ms.
// This test fails without applyAgentGitHTTPCreds wired into
// prepareProvisionContext and proves the prod-team path is closed.
func TestPrepareProvisionContext_InjectsGitHTTPCredsFromPersonaToken(t *testing.T) {
// Stage a persona dir matching the prod-team shape per
// reference_prod_team_infisical_identities — a flat dir per role
// with a single mode-600 `token` file.
root := t.TempDir()
for _, role := range []string{"agent-dev-a", "agent-dev-b"} {
roleDir := filepath.Join(root, role)
if err := os.MkdirAll(roleDir, 0o755); err != nil {
t.Fatal(err)
}
// Token value pinned to a recognizable string so we can
// assert exact propagation. Real bootstrap-kit files end in
// \n; the helper must trim that.
if err := os.WriteFile(filepath.Join(roleDir, "token"),
[]byte("token-for-"+role+"\n"), 0o600); err != nil {
t.Fatal(err)
}
}
t.Setenv("MOLECULE_PERSONA_ROOT", root)
cases := []struct {
name string
role string
expectInject bool
expectUser string
expectPass string
}{
{
name: "Dev-A slug role → persona token injected as GIT_HTTP_USERNAME/PASSWORD",
role: "agent-dev-a",
expectInject: true,
expectUser: "agent-dev-a",
expectPass: "token-for-agent-dev-a",
},
{
name: "Dev-B slug role → persona token injected",
role: "agent-dev-b",
expectInject: true,
expectUser: "agent-dev-b",
expectPass: "token-for-agent-dev-b",
},
{
name: "descriptive multi-word role → silent no-op (no persona dir lookup)",
role: "Frontend Engineer",
expectInject: false,
},
{
name: "unknown slug role with no persona dir → silent no-op",
role: "agent-nonexistent",
expectInject: false,
},
{
name: "empty role → silent no-op",
role: "",
expectInject: false,
},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
mock := setupTestDB(t)
mock.ExpectQuery(`SELECT key, encrypted_value, encryption_version FROM global_secrets`).
WillReturnRows(sqlmock.NewRows([]string{"key", "encrypted_value", "encryption_version"}))
mock.ExpectQuery(`SELECT key, encrypted_value, encryption_version FROM workspace_secrets`).
WithArgs("ws-prod-team").
WillReturnRows(sqlmock.NewRows([]string{"key", "encrypted_value", "encryption_version"}))
handler := NewWorkspaceHandler(&captureBroadcaster{}, nil, "http://localhost:8080", t.TempDir())
payload := models.CreateWorkspacePayload{
Name: "Dev-A",
Role: tc.role,
Tier: 1,
}
prepared, abort := handler.prepareProvisionContext(
context.Background(), "ws-prod-team", "/nonexistent", nil, payload, false)
if abort != nil {
t.Fatalf("unexpected abort: %s", abort.Msg)
}
gotUser, hasUser := prepared.EnvVars["GIT_HTTP_USERNAME"]
gotPass, hasPass := prepared.EnvVars["GIT_HTTP_PASSWORD"]
if tc.expectInject {
if !hasUser || gotUser != tc.expectUser {
t.Errorf("GIT_HTTP_USERNAME: got %q (present=%v), want %q",
gotUser, hasUser, tc.expectUser)
}
if !hasPass || gotPass != tc.expectPass {
t.Errorf("GIT_HTTP_PASSWORD: got %q (present=%v), want %q",
gotPass, hasPass, tc.expectPass)
}
// Fallback pair should ALSO be set so askpass's
// GITEA_USER/GITEA_TOKEN fallback chain works
// (GITEA_TOKEN will then be stripped at
// buildContainerEnv per forensic #145, but
// GITEA_USER survives — see provisioner_test.go
// "persona-file path" subtest).
if prepared.EnvVars["GITEA_USER"] != tc.expectUser {
t.Errorf("GITEA_USER fallback: got %q, want %q",
prepared.EnvVars["GITEA_USER"], tc.expectUser)
}
if prepared.EnvVars["GITEA_TOKEN"] != tc.expectPass {
t.Errorf("GITEA_TOKEN fallback: got %q, want %q",
prepared.EnvVars["GITEA_TOKEN"], tc.expectPass)
}
} else {
if hasUser {
t.Errorf("GIT_HTTP_USERNAME should NOT be set for role %q; got %q",
tc.role, gotUser)
}
if hasPass {
t.Errorf("GIT_HTTP_PASSWORD should NOT be set for role %q; got %q",
tc.role, gotPass)
}
}
// applyAgentGitIdentity always wires GIT_ASKPASS when
// payload.Name is non-empty — sanity check that the new
// wiring didn't accidentally bypass the existing askpass
// env-set (the helper without env = nothing to emit).
if prepared.EnvVars["GIT_ASKPASS"] != "/usr/local/bin/molecule-askpass" {
t.Errorf("GIT_ASKPASS should remain wired by applyAgentGitIdentity; got %q",
prepared.EnvVars["GIT_ASKPASS"])
}
})
}
}
// TestPrepareProvisionContext_WorkspaceSecretWinsOverPersonaToken pins
// the precedence contract: an operator-supplied workspace_secret named
// GIT_HTTP_USERNAME / GIT_HTTP_PASSWORD (loaded by loadWorkspaceSecrets
// BEFORE applyAgentGitHTTPCreds runs) must beat the persona-file
// default. This is the standard escape hatch — if an operator needs a
// per-workspace override (e.g. a workspace-scoped Gitea token with
// narrower repo access than the persona's), the secrets API still
// works.
func TestPrepareProvisionContext_WorkspaceSecretWinsOverPersonaToken(t *testing.T) {
root := t.TempDir()
roleDir := filepath.Join(root, "agent-dev-a")
if err := os.MkdirAll(roleDir, 0o755); err != nil {
t.Fatal(err)
}
if err := os.WriteFile(filepath.Join(roleDir, "token"),
[]byte("persona-file-token\n"), 0o600); err != nil {
t.Fatal(err)
}
t.Setenv("MOLECULE_PERSONA_ROOT", root)
mock := setupTestDB(t)
mock.ExpectQuery(`SELECT key, encrypted_value, encryption_version FROM global_secrets`).
WillReturnRows(sqlmock.NewRows([]string{"key", "encrypted_value", "encryption_version"}))
// Workspace secret pre-populates GIT_HTTP_USERNAME / GIT_HTTP_PASSWORD —
// these come from loadWorkspaceSecrets which runs before applyAgentGitHTTPCreds.
// encryption_version=0 means raw bytes (crypto disabled in test).
mock.ExpectQuery(`SELECT key, encrypted_value, encryption_version FROM workspace_secrets`).
WithArgs("ws-prod-team").
WillReturnRows(sqlmock.NewRows([]string{"key", "encrypted_value", "encryption_version"}).
AddRow("GIT_HTTP_USERNAME", []byte("operator-override-user"), 0).
AddRow("GIT_HTTP_PASSWORD", []byte("operator-override-pass"), 0))
handler := NewWorkspaceHandler(&captureBroadcaster{}, nil, "http://localhost:8080", t.TempDir())
payload := models.CreateWorkspacePayload{
Name: "Dev-A",
Role: "agent-dev-a",
Tier: 1,
}
prepared, abort := handler.prepareProvisionContext(
context.Background(), "ws-prod-team", "/nonexistent", nil, payload, false)
if abort != nil {
t.Fatalf("unexpected abort: %s", abort.Msg)
}
if prepared.EnvVars["GIT_HTTP_USERNAME"] != "operator-override-user" {
t.Errorf("operator override lost — GIT_HTTP_USERNAME: got %q, want %q",
prepared.EnvVars["GIT_HTTP_USERNAME"], "operator-override-user")
}
if prepared.EnvVars["GIT_HTTP_PASSWORD"] != "operator-override-pass" {
t.Errorf("operator override lost — GIT_HTTP_PASSWORD: got %q, want %q",
prepared.EnvVars["GIT_HTTP_PASSWORD"], "operator-override-pass")
}
}
// TestReadOrLazyHealInboundSecret pins the four branches of the
// shared lazy-heal helper directly. Each call site (chat_files,
// registry) has its own integration test, but those go through the
@@ -8,6 +8,7 @@ import (
"runtime/debug"
"strings"
"sync"
"sync/atomic"
"time"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/db"
@@ -39,12 +40,57 @@ type restartState struct {
mu sync.Mutex
running bool // true while a restart cycle is in flight
pending bool // set by any caller that arrived during the in-flight cycle
// restartStartedAt records the wall-clock when the most recent cycle
// flipped running=true. Used by the self-fire debounce (internal#544,
// the ws-server self-fire restart feedback loop seen in prod-Reviewer/
// Researcher 2026-05-19 ~00:05Z 4x reprov thrash): any RestartByID
// arriving within restartDebounceWindow of this timestamp is silently
// dropped so a probe firing during the EC2-pending window can't
// re-trigger a fresh full cycle on the just-launched instance.
restartStartedAt time.Time
}
// restartStates is a per-workspace map of *restartState. Each workspace gets
// its own entry so unrelated workspaces don't serialize on each other.
var restartStates sync.Map // map[workspaceID]*restartState
// restartDebounceWindow is the silent-drop window for successive RestartByID
// calls. Sized to cover the typical EC2 pending → online interval (20-30s)
// with a margin so a probe firing during the just-after-online but still-
// flaky heartbeat window also gets dropped. Bigger than that would block
// legitimate "Restart failed, retry" recoveries; smaller would let the
// 4x thrash class through. Package-level so tests can shrink it.
var restartDebounceWindow = 60 * time.Second
// restartByIDDropCounter is incremented every time RestartByID drops a call
// inside the debounce window. Exposed as a package-level atomic counter so
// (a) tests can assert the drop fired, (b) ops can grep logs for the drop
// log line + the counter snapshot in a future /admin/metrics endpoint.
// Not a Prometheus metric because the platform doesn't pull metrics from
// workspace-server yet — that's a separate RFC.
var restartByIDDropCounter atomic.Uint64
// isRestarting reports whether a restart cycle is currently in flight for
// the workspace. Callers that have their own "container looks dead" probe
// MUST consult this before triggering a restart, because during the
// 20-30s EC2-pending window the workspace's url='' and IsRunning()=false
// looks identical to a dead container — and any restart-triggering probe
// (maybeMarkContainerDead from canvas /delegations poll, or the trailing
// restart-context probe at the end of runRestartCycle) will set
// pending=true and the outer coalesceRestart loop will drain by running
// ANOTHER full cycle, ec2_stopped of the just-booted instance →
// re-provision. That's the self-fire loop closed by this gate.
func isRestarting(workspaceID string) bool {
sv, ok := restartStates.Load(workspaceID)
if !ok {
return false
}
state := sv.(*restartState)
state.mu.Lock()
defer state.mu.Unlock()
return state.running
}
// isParentPaused checks if any ancestor of the workspace is paused.
func isParentPaused(ctx context.Context, workspaceID string) (bool, string) {
var parentID *string
@@ -376,9 +422,45 @@ func (h *WorkspaceHandler) RestartByID(workspaceID string) {
if !h.HasProvisioner() {
return
}
// Self-fire debounce: drop (not coalesce) successive RestartByID calls
// within restartDebounceWindow of the most recent cycle's start. This
// is the load-bearing protection against the 4x reprov thrash class —
// coalesceRestart's pending-flag would otherwise drain by running
// ANOTHER full cycle of stop+provision on the just-launched EC2 (still
// in the pending state), which is the self-fire we're closing.
//
// Only applies to RestartByID (programmatic — secrets handler,
// maybeMarkContainerDead, preflightContainerHealth). The HTTP Restart
// handler in workspace_restart.go's Restart() bypasses this path and
// calls RestartWorkspaceAutoOpts directly, so user-initiated restart
// clicks are unaffected.
if shouldDebounceRestart(workspaceID) {
restartByIDDropCounter.Add(1)
log.Printf("RestartByID: %s — dropped (within %s self-fire debounce window; total dropped=%d)",
workspaceID, restartDebounceWindow, restartByIDDropCounter.Load())
return
}
coalesceRestart(workspaceID, func() { h.runRestartCycle(workspaceID) })
}
// shouldDebounceRestart reports whether the most recent cycle for this
// workspace started within restartDebounceWindow. Read-only on
// restartState; the actual restartStartedAt stamp is written in
// coalesceRestart when running flips false→true.
func shouldDebounceRestart(workspaceID string) bool {
sv, ok := restartStates.Load(workspaceID)
if !ok {
return false
}
state := sv.(*restartState)
state.mu.Lock()
defer state.mu.Unlock()
if state.restartStartedAt.IsZero() {
return false
}
return time.Since(state.restartStartedAt) < restartDebounceWindow
}
// coalesceRestart implements the pending-flag gate around an arbitrary cycle
// function. Extracted from RestartByID for direct unit testing — the cycle
// function in production is `runRestartCycle`, but tests pass a counter to
@@ -398,6 +480,12 @@ func coalesceRestart(workspaceID string, cycle func()) {
return
}
state.running = true
// Stamp the start time so the RestartByID debounce can drop any
// self-fire probe that hits within restartDebounceWindow. Only the
// false→true edge stamps; the drain-loop's inner cycles re-use the
// same start (they're effectively one "restart event" from the
// debounce's POV).
state.restartStartedAt = time.Now()
state.mu.Unlock()
// Always clear running on exit — including panic — so a panicking
@@ -0,0 +1,297 @@
package handlers
// Tests for the 2026-05-19 ws-server self-fire restart feedback loop fix.
//
// Empirical chain reproduced (prod-Reviewer/Researcher 4x reprov thrash
// 2026-05-19 ~00:05-00:09Z, root-caused via Loki):
//
// 1. POST /secrets → go h.restartFunc(workspaceID) (secrets.go:264).
// 2. runRestartCycle sets url='' synchronously, then async provisions EC2
// (workspace_restart.go).
// 3. During 20-30s window while EC2 is `pending` (codex first heartbeat
// not yet landed): workspaces.url='' AND IsRunning=false.
// 4. Any ProxyA2A (canvas /delegations poll OR the restart-context probe
// at the end of runRestartCycle) → maybeMarkContainerDead sees the
// container-dead state → calls RestartByID → loop.
// 5. coalesceRestart sets pending=true, drains by running ANOTHER full
// cycle → provision.ec2_stopped of the just-booted instance →
// re-provision.
//
// Fix: three interdependent layers.
//
// L1) isRestarting() gate in maybeMarkContainerDead +
// preflightContainerHealth — early-return false/nil so the probe
// can't trigger a fresh RestartByID while a restart is in flight.
// L2) sendRestartContext requires url != '' AND last_heartbeat_at >
// restart_start_ts before firing the trailing ProxyA2A probe.
// L3) RestartByID silently drops successive calls within
// restartDebounceWindow of restartStartedAt, with a counter for
// observability.
import (
"context"
"sync/atomic"
"testing"
"time"
"github.com/DATA-DOG/go-sqlmock"
)
// resetSelfFireState wipes all the per-workspace mutation state these
// tests touch, plus the package-level drop counter, so the test is
// hermetic regardless of ordering.
func resetSelfFireState(workspaceID string) {
restartStates.Delete(workspaceID)
restartByIDDropCounter.Store(0)
}
// markRestarting forces restartStates into "cycle in flight" without
// running an actual cycle, so the tests can isolate the gate behaviour
// without the full provision pipeline. Returns a finish() that flips
// running=false (mimicking coalesceRestart's deferred state-clear).
func markRestarting(workspaceID string) (finish func()) {
sv, _ := restartStates.LoadOrStore(workspaceID, &restartState{})
state := sv.(*restartState)
state.mu.Lock()
state.running = true
state.restartStartedAt = time.Now()
state.mu.Unlock()
return func() {
state.mu.Lock()
state.running = false
state.mu.Unlock()
}
}
// TestIsRestarting_FalseWhenNoStateEntry — baseline: a workspace that
// has never been restarted reports !isRestarting. Pinning this so a
// future LoadOrStore refactor can't silently start returning true for
// unknown workspaces.
func TestIsRestarting_FalseWhenNoStateEntry(t *testing.T) {
const wsID = "self-fire-ws-never"
resetSelfFireState(wsID)
if isRestarting(wsID) {
t.Fatal("isRestarting must return false for a workspace with no state entry")
}
}
// TestIsRestarting_TrueWhileCycleRunning — the load-bearing invariant
// that Layer 1 depends on. While running=true, isRestarting must report
// true; the moment it flips to false, isRestarting must report false.
func TestIsRestarting_TrueWhileCycleRunning(t *testing.T) {
const wsID = "self-fire-ws-in-flight"
resetSelfFireState(wsID)
finish := markRestarting(wsID)
if !isRestarting(wsID) {
t.Fatal("isRestarting must return true while running=true")
}
finish()
if isRestarting(wsID) {
t.Fatal("isRestarting must return false after running flips back to false")
}
}
// TestMaybeMarkContainerDead_SkippedWhileRestarting — Layer 1 for the
// reactive path. With isRestarting=true the function must early-return
// false WITHOUT invoking IsRunning, hitting the DB UPDATE, or kicking
// a RestartByID goroutine. If any of those side-effects fire we'd
// re-arm the self-fire loop the gate exists to close.
func TestMaybeMarkContainerDead_SkippedWhileRestarting(t *testing.T) {
const wsID = "self-fire-ws-mmcd"
resetSelfFireState(wsID)
mock := setupTestDB(t) // sqlmock with strict expectation matching
// Workspace row read inside maybeMarkContainerDead — this happens
// BEFORE the isRestarting gate in the current implementation, so
// allow exactly one SELECT runtime row.
mock.ExpectQuery(`SELECT COALESCE\(runtime, 'langgraph'\) FROM workspaces WHERE id =`).
WithArgs(wsID).
WillReturnRows(sqlmock.NewRows([]string{"runtime"}).AddRow("claude-code"))
// Gate flipped: must early-return without doing anything else.
finish := markRestarting(wsID)
defer finish()
stub := &preflightLocalProv{running: false, err: nil}
h := newSelfFireHandler(t)
h.provisioner = stub
if got := h.maybeMarkContainerDead(context.Background(), wsID); got != false {
t.Errorf("maybeMarkContainerDead must return false while restarting, got %v", got)
}
if stub.calls != 0 {
t.Errorf("IsRunning must not be called while restarting (Layer 1 gate broken); got %d calls", stub.calls)
}
}
// TestPreflightContainerHealth_SkippedWhileRestarting — Layer 1 for the
// proactive path. Same shape as above: with restart in flight, return
// nil (let the optimistic forward proceed) and DO NOT call IsRunning.
// The forward will fail with a connect error; the post-restart reactive
// path can decide what to do then, by which point the EC2 has either
// come up (no more failures) or markProvisionFailed has fired.
func TestPreflightContainerHealth_SkippedWhileRestarting(t *testing.T) {
const wsID = "self-fire-ws-preflight"
resetSelfFireState(wsID)
_ = setupTestDB(t)
finish := markRestarting(wsID)
defer finish()
stub := &preflightLocalProv{running: false, err: nil}
h := newSelfFireHandler(t)
h.provisioner = stub
if err := h.preflightContainerHealth(context.Background(), wsID); err != nil {
t.Errorf("preflightContainerHealth must return nil while restarting, got %+v", err)
}
if stub.calls != 0 {
t.Errorf("IsRunning must not be called while restarting (Layer 1 gate broken); got %d calls", stub.calls)
}
}
// TestRestartByID_DebounceSilentDrop — Layer 3. After a cycle starts,
// any RestartByID arriving within restartDebounceWindow MUST be dropped
// silently — not coalesced (which would still drain to another cycle).
// The drop counter must increment by exactly one per dropped call so
// ops can see how often the self-fire would have fired pre-fix.
func TestRestartByID_DebounceSilentDrop(t *testing.T) {
const wsID = "self-fire-ws-debounce"
resetSelfFireState(wsID)
// Stamp restartStartedAt = now, running=false (simulates the "just
// finished" window where the loop would re-fire pre-fix).
sv, _ := restartStates.LoadOrStore(wsID, &restartState{})
state := sv.(*restartState)
state.mu.Lock()
state.restartStartedAt = time.Now()
state.running = false
state.mu.Unlock()
// Counter baseline.
if got := restartByIDDropCounter.Load(); got != 0 {
t.Fatalf("expected drop counter 0 at start, got %d", got)
}
// Five rapid-fire RestartByID calls should all drop (the maximum
// observed pre-fix was 4x — pinning >=4 here keeps the regression
// shape true to the prod incident).
h := newSelfFireHandler(t)
stub := &preflightLocalProv{running: true, err: nil}
h.provisioner = stub
for i := 0; i < 5; i++ {
h.RestartByID(wsID)
}
if got := restartByIDDropCounter.Load(); got != 5 {
t.Errorf("expected 5 drops within debounce window, got %d", got)
}
// shouldDebounceRestart itself must report true for the same window.
if !shouldDebounceRestart(wsID) {
t.Error("shouldDebounceRestart must return true within window")
}
}
// TestRestartByID_DebounceExpiresAfterWindow — outside the window, the
// debounce must release: a legitimate later restart (e.g. user clicked
// Restart again after waiting) must proceed to coalesceRestart. We
// shrink restartDebounceWindow to 1ms for the duration of this test so
// we don't sleep a full 60s in CI.
func TestRestartByID_DebounceExpiresAfterWindow(t *testing.T) {
const wsID = "self-fire-ws-debounce-release"
resetSelfFireState(wsID)
orig := restartDebounceWindow
restartDebounceWindow = 5 * time.Millisecond
defer func() { restartDebounceWindow = orig }()
// Stamp inside the window.
sv, _ := restartStates.LoadOrStore(wsID, &restartState{})
state := sv.(*restartState)
state.mu.Lock()
state.restartStartedAt = time.Now()
state.running = false
state.mu.Unlock()
if !shouldDebounceRestart(wsID) {
t.Fatal("within 5ms window must debounce")
}
// Sleep past the window. Use a small margin to avoid clock-skew
// flakes on slow CI hosts.
time.Sleep(20 * time.Millisecond)
if shouldDebounceRestart(wsID) {
t.Fatal("after 20ms (4x window) must no longer debounce")
}
}
// TestRestartByID_SingleProvisionPerRestart — the regression test for
// the prod incident: a SINGLE secrets PUT (which is the trigger shape)
// must produce exactly ONE coalesceRestart cycle, not four. Models the
// full chain: secrets handler → RestartByID → coalesceRestart → cycle
// runs → during the cycle window, simulated probes call RestartByID
// again. With all three layers in place, the probes are dropped and the
// total cycle count stays at 1.
func TestRestartByID_SingleProvisionPerRestart(t *testing.T) {
const wsID = "self-fire-ws-single-provision"
resetSelfFireState(wsID)
// In-flight gate that mimics the EC2-pending window. The cycle
// blocks on cycleProceed so we can fire the simulated probes while
// running=true.
var cycleCount atomic.Int32
cycleStarted := make(chan struct{}, 1)
cycleProceed := make(chan struct{})
cycle := func() {
n := cycleCount.Add(1)
if n == 1 {
cycleStarted <- struct{}{}
<-cycleProceed
}
}
// Kick the first cycle via coalesceRestart (this is what RestartByID
// would do post-debounce-check).
done := make(chan struct{})
go func() {
coalesceRestart(wsID, cycle)
close(done)
}()
<-cycleStarted
// Simulate the 4 probe-driven RestartByID calls observed in prod.
// Each must drop because we're within the debounce window AND a
// cycle is in flight.
h := newSelfFireHandler(t)
stub := &preflightLocalProv{running: true, err: nil}
h.provisioner = stub
for i := 0; i < 4; i++ {
h.RestartByID(wsID)
}
// Release the cycle.
close(cycleProceed)
<-done
if got := cycleCount.Load(); got != 1 {
t.Errorf("expected exactly 1 provision cycle for a single trigger "+
"(self-fire fix), got %d — regression of the prod 4x reprov thrash class",
got)
}
if got := restartByIDDropCounter.Load(); got != 4 {
t.Errorf("expected 4 self-fire probes dropped, got %d "+
"(observability counter must record the saved cycles)", got)
}
}
// newSelfFireHandler constructs a minimal *WorkspaceHandler suitable for
// the Layer-1 gate tests. Wraps the boilerplate so the per-test setup
// stays focused on the assertion.
func newSelfFireHandler(t *testing.T) *WorkspaceHandler {
t.Helper()
return NewWorkspaceHandler(newTestBroadcaster(), nil, "http://localhost:8080", t.TempDir())
}
+22
View File
@@ -34,6 +34,28 @@ async def list_peers() -> list[dict]:
async def delegate_task(workspace_id: str, task: str) -> str:
"""Send a task to a peer workspace via A2A and return the response text."""
# Task #190 / #193 — Self-delegation guard. Without this, a workspace
# delegating to its own UUID round-trips through the platform proxy back
# into the sender; the synchronous handler waits on the same lock the
# caller holds, the request times out, and the platform writes an
# a2a_receive activity row with source_id=our own workspace UUID. The
# inbox poller then surfaces that row as kind="peer_agent" and the agent
# sees the timeout echoed back as a peer instructing it (#190).
#
# The sibling guards live in:
# - workspace-server/internal/handlers/delegation.go (Go API gate)
# - workspace/a2a_tools_delegation.py (MCP path guard)
# This module is the framework-agnostic adapter surface used by adapters
# that don't go through a2a_tools_delegation.py — it needs its own guard.
if WORKSPACE_ID and workspace_id == WORKSPACE_ID:
return (
"Error: self-delegation rejected (cannot delegate_task to your own "
"workspace). There is no peer who is also you — the platform proxy "
"would deadlock and the timeout would echo back as a peer_agent "
"message from yourself (#190). Do the work directly, or use "
"commit_memory / send_message_to_user instead."
)
async with httpx.AsyncClient(timeout=120.0) as client:
# Discover target URL
try:
+22
View File
@@ -412,6 +412,28 @@ async def delegate_task_async(
"""
task_id = str(uuid.uuid4())
# Task #190 / #193 — Self-delegation guard (async path). Even on the
# async path that returns a task_id immediately, _execute_delegation
# eventually fires the A2A POST back to our own URL, which times out
# against our own held run lock, gets recorded with source_id=our
# workspace UUID, and surfaces in the inbox as a peer_agent message
# from ourselves (#190). Reject before scheduling the background task
# so no peer_agent echo can be generated. Sibling guards:
# - workspace-server/internal/handlers/delegation.go (Go API gate)
# - workspace/a2a_tools_delegation.py (MCP sync + async paths)
# - workspace/builtin_tools/a2a_tools.py (framework-agnostic sync)
if WORKSPACE_ID and workspace_id == WORKSPACE_ID:
log_event(event_type="delegation", action="delegate", resource=workspace_id,
outcome="rejected_self_delegation", trace_id=task_id)
return {
"success": False,
"error": (
"self-delegation rejected: cannot delegate_task_async to your "
"own workspace (would time out and echo back as a peer_agent "
"message from yourself — #190)"
),
}
# RBAC check
roles, custom_perms = get_workspace_roles()
if not check_permission("delegate", roles, custom_perms):
+53
View File
@@ -9,6 +9,59 @@
# Pattern matches the legacy monorepo workspace/entrypoint.sh:
# fix volume ownership as root, then re-exec via gosu as agent (uid 1000).
# --- RFC#523 Layer 2: tenant-workspace forbidden-env guard (task #146) ---
# Defense-in-depth. The provisioner (workspace-server) has a fail-closed
# abort at provision time (Layer 1, prepareProvisionContext), and the
# in-container env-build has a silent strip (forensic #145,
# provisioner.buildContainerEnv). This guard fires if either upstream
# layer is bypassed — e.g. someone runs this image standalone with
# `docker run -e GITEA_TOKEN=...`. Exit 1 with a clear message instead
# of running with an operator-scope credential in tenant scope.
#
# Key names are generic. The MOLECULE_OPERATOR_ prefix is the one
# molecule-AI-specific literal; this entrypoint lives inside the
# claude-code template that is internal-only (memory
# `feedback_open_source_templates_no_hardcoded_org_internals` — claude-
# code template is internal, separate-published templates must NOT carry
# org-specific literals). A fork can edit FORBIDDEN_KEYS /
# FORBIDDEN_PREFIXES for its own operator-scope names without touching
# the rest of the entrypoint.
#
# Skipped when MOLECULE_TENANT_GUARD_DISABLE=1 — for local-dev where the
# operator host IS the tenant host (e.g. running molecule-runtime on the
# operator box for debugging). NEVER set this in tenant containers.
if [ "${MOLECULE_TENANT_GUARD_DISABLE:-0}" != "1" ]; then
FORBIDDEN_KEYS="GITEA_TOKEN GITEA_PAT GITHUB_TOKEN GITHUB_PAT GH_TOKEN GITLAB_TOKEN GL_TOKEN BITBUCKET_TOKEN CP_ADMIN_API_TOKEN CP_ADMIN_TOKEN INFISICAL_OPERATOR_TOKEN INFISICAL_BOOTSTRAP_TOKEN RAILWAY_TOKEN RAILWAY_PERSONAL_API_TOKEN HETZNER_TOKEN HETZNER_API_TOKEN"
FORBIDDEN_PREFIXES="MOLECULE_OPERATOR_"
FOUND=""
for k in $FORBIDDEN_KEYS; do
# eval is safe here — $k is from a static whitespace-separated
# literal list above (no user input). POSIX sh has no
# associative arrays, hence the indirect-expansion via eval to
# test "is this var set" without caring about its value.
eval "v=\${$k+set}"
if [ "$v" = "set" ]; then
FOUND="$FOUND $k"
fi
done
for prefix in $FORBIDDEN_PREFIXES; do
# env | awk is the portable POSIX way to enumerate by prefix.
# busybox awk (alpine), gawk (debian), and BSD awk (macOS-test)
# all support index(). Doesn't depend on bash arrays / [[ =~ ]].
prefix_hits=$(env | awk -F= -v p="$prefix" 'index($1, p)==1 {print $1}')
if [ -n "$prefix_hits" ]; then
FOUND="$FOUND $prefix_hits"
fi
done
if [ -n "$FOUND" ]; then
echo "RFC#523 Layer 2: refusing to start tenant workspace — forbidden operator-scope env var(s) present:$FOUND" >&2
echo "These vars are operator-fleet scope and must not reach tenant workspaces." >&2
echo "Remove them from workspace_secrets / global_secrets / docker -e and retry." >&2
echo "If running this image standalone for local dev with intentional operator scope, set MOLECULE_TENANT_GUARD_DISABLE=1." >&2
exit 1
fi
fi
if [ "$(id -u)" = "0" ]; then
# Configs volume is created by Docker as root; agent needs write access
# for plugin installs, memory writes, .auth_token rotation, etc.
+24 -1
View File
@@ -102,11 +102,34 @@ class InboxMessage:
arrival_workspace_id: str = ""
def to_dict(self) -> dict[str, Any]:
# Task #190 / #193 — Distinguish delegation-result rows from peer-agent
# messages. The platform's pushDelegationResultToInbox (RFC #2829 PR-2)
# writes activity_type='a2a_receive' with method='delegate_result' and
# source_id=our own workspace UUID, so the caller's inbox poller can
# surface delegation completions/failures via wait_for_message. But
# the default to_dict derives kind="peer_agent" purely from peer_id
# being non-empty — which makes a synchronous-delegation timeout, or
# a cross-workspace ProxyA2A failure, appear to the agent as a NEW
# peer_agent message from our own workspace UUID (#190 self-echo).
#
# Explicitly classify rows with method='delegate_result' as
# kind='delegation_result' regardless of peer_id, so:
# 1. wait_for_message gives the original caller a structured
# delegation result (not a fake peer instruction).
# 2. Agents reading the envelope don't mistake the row for a
# peer instructing them — preventing the #190 reply-via-
# delegate_task-to-self loop.
if self.method == "delegate_result":
kind = "delegation_result"
elif self.peer_id:
kind = "peer_agent"
else:
kind = "canvas_user"
d = {
"activity_id": self.activity_id,
"text": self.text,
"peer_id": self.peer_id,
"kind": "peer_agent" if self.peer_id else "canvas_user",
"kind": kind,
"method": self.method,
"created_at": self.created_at,
}
+55
View File
@@ -325,3 +325,58 @@ class TestGetPeersSummary:
result = await mod.get_peers_summary()
assert result == "No peers available."
# ---------------------------------------------------------------------------
# Self-delegation guard (Task #190 / #193)
# ---------------------------------------------------------------------------
class TestSelfDelegationGuard:
"""delegate_task to your own workspace UUID must be rejected BEFORE any
discovery / proxy hop. Otherwise the request round-trips back to us,
deadlocks on the run lock, times out, and surfaces in the inbox as a
peer_agent message from our own workspace (the documented #190 self-echo
bug)."""
async def test_delegate_task_rejects_self(self, monkeypatch):
mod = _load_a2a_tools(monkeypatch, workspace_id="ws-self-abc")
calls = []
class TrappingClient:
def __init__(self, timeout): pass
async def __aenter__(self): return self
async def __aexit__(self, *a): pass
async def get(self, *a, **kw):
calls.append(("get", a, kw))
raise AssertionError("guard must reject before discover")
async def post(self, *a, **kw):
calls.append(("post", a, kw))
raise AssertionError("guard must reject before proxy POST")
monkeypatch.setattr(mod.httpx, "AsyncClient", TrappingClient)
result = await mod.delegate_task("ws-self-abc", "do a thing")
assert "self-delegation" in result.lower()
assert not calls, "no HTTP call should be made for self-delegation"
async def test_delegate_task_allows_real_peer(self, monkeypatch):
"""Guard is strictly equality on WORKSPACE_ID — a different target
passes through to the normal discover/proxy path."""
mod = _load_a2a_tools(monkeypatch, workspace_id="ws-self-abc")
class FakeClient:
def __init__(self, timeout): pass
async def __aenter__(self): return self
async def __aexit__(self, *a): pass
async def get(self, url, headers=None):
return _FakeResponse(200, {"url": "http://target.test/a2a"})
async def post(self, url, json=None, headers=None):
return _FakeResponse(200, {
"result": {"parts": [{"kind": "text", "text": "ok"}]}
})
monkeypatch.setattr(mod.httpx, "AsyncClient", FakeClient)
result = await mod.delegate_task("ws-DIFFERENT-xyz", "do a thing")
assert "self-delegation" not in result.lower()
+35
View File
@@ -148,6 +148,41 @@ class TestRBAC:
assert "RBAC" in result["error"]
class TestSelfDelegationGuard:
"""Task #190 / #193 — delegate_task_async must reject delegation to the
caller's own workspace BEFORE scheduling the background task. Otherwise
the platform A2A round-trip times out against our own held run lock, the
failure is logged with source_id=our workspace UUID, and the inbox
poller surfaces the row as a peer_agent message from ourselves."""
@pytest.mark.asyncio
async def test_async_path_rejects_self_workspace(self, delegation_mocks):
mod, *_ = delegation_mocks
# WORKSPACE_ID was set to "ws-self" by the fixture's monkeypatch.
# The module reads it at import time → reload-equivalent comparison.
mod.WORKSPACE_ID = "ws-self"
result = await _invoke(mod, workspace_id="ws-self")
assert result["success"] is False
assert "self-delegation" in result["error"].lower()
# No background task should have been scheduled.
assert len(mod._background_tasks) == 0
@pytest.mark.asyncio
async def test_async_path_allows_different_workspace(self, delegation_mocks):
"""Guard does NOT short-circuit a real peer target."""
mod, *_ = delegation_mocks
mod.WORKSPACE_ID = "ws-self"
_, mock_cls = _make_mock_client()
with patch("httpx.AsyncClient", mock_cls):
result = await _invoke(mod, workspace_id="ws-peer")
assert result["success"] is True
assert result["status"] == "delegated"
class TestAsyncDelegation:
@pytest.mark.asyncio
+122
View File
@@ -0,0 +1,122 @@
#!/usr/bin/env bash
# Smoke-test for RFC#523 Layer 2 (task #146): the workspace/entrypoint.sh
# top-of-file forbidden-env guard.
#
# Strategy: source the prefix of entrypoint.sh that contains the guard
# (up through the closing `fi` of the guard block), in a sub-shell with
# the env we want to test. We rewrite the `exit 1` to a `return 1` so
# the guard signals failure via the sub-shell's exit code without
# killing the test harness.
#
# Why not docker-run the actual image: the test is unit-scope (does
# the guard logic correctly identify forbidden vs allowed env). Image
# integration is covered by the E2E provision test described in
# RFC#523 §"Acceptance criteria" Layer 2 (run on staging, not here).
#
# Pairs with: workspace_provision_forbidden_env_test.go (Layer 1
# Go-side unit tests).
set -euo pipefail
HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
ENTRYPOINT="$HERE/../entrypoint.sh"
if [[ ! -f "$ENTRYPOINT" ]]; then
echo "FAIL: entrypoint not found: $ENTRYPOINT" >&2
exit 1
fi
# Extract just the guard block (from the first `if [ "${MOLECULE_TENANT_GUARD_DISABLE`
# through the matching `fi`) and rewrite `exit 1` to `return 1` so the
# guard can be invoked inside a function in a sub-shell.
GUARD_SNIPPET=$(awk '
/^if \[ "\${MOLECULE_TENANT_GUARD_DISABLE/ { inblock=1 }
inblock { print }
inblock && /^fi$/ { exit }
' "$ENTRYPOINT" | sed 's/exit 1/return 1/')
if [[ -z "$GUARD_SNIPPET" ]]; then
echo "FAIL: could not extract guard block from $ENTRYPOINT" >&2
exit 1
fi
# Helper: run the guard with the env we set, capture exit code. The
# sub-shell starts with `env -i` semantics emulated by `unset` of every
# var the guard checks, so prior shell state doesn't contaminate.
run_guard() {
# Pass extra-env assignments as args; e.g. run_guard GITEA_TOKEN=x.
(
set +e
# Defensive unset of all keys the guard inspects, so the
# caller's args are the ONLY positive cases.
unset GITEA_TOKEN GITEA_PAT GITHUB_TOKEN GITHUB_PAT GH_TOKEN GITLAB_TOKEN GL_TOKEN BITBUCKET_TOKEN
unset CP_ADMIN_API_TOKEN CP_ADMIN_TOKEN
unset INFISICAL_OPERATOR_TOKEN INFISICAL_BOOTSTRAP_TOKEN
unset RAILWAY_TOKEN RAILWAY_PERSONAL_API_TOKEN HETZNER_TOKEN HETZNER_API_TOKEN
unset MOLECULE_OPERATOR_HOST MOLECULE_OPERATOR_SSH_KEY
unset MOLECULE_TENANT_GUARD_DISABLE
for kv in "$@"; do
export "$kv"
done
guard_fn() {
eval "$GUARD_SNIPPET"
}
guard_fn
echo $?
)
}
PASS=0
FAIL=0
assert_exit() {
local label="$1"
local want="$2"
shift 2
local got
got=$(run_guard "$@" | tail -n 1)
if [[ "$got" == "$want" ]]; then
echo "PASS: $label"
PASS=$((PASS + 1))
else
echo "FAIL: $label — want exit=$want got=$got (env: $*)" >&2
FAIL=$((FAIL + 1))
fi
}
# --- Case 1: clean env passes (exit 0) ---
assert_exit "clean_env_passes" 0
# --- Case 2: per-agent-scope vars pass (exit 0) ---
assert_exit "per_agent_vars_pass" 0 \
GIT_HTTP_USERNAME=agent-dev-a \
GIT_HTTP_PASSWORD=scoped-pat \
ANTHROPIC_API_KEY=sk-keep \
MOLECULE_AGENT_ROLE=agent-dev-a
# --- Case 3: forbidden exact-match keys fail (exit 1) ---
assert_exit "gitea_token_blocks" 1 GITEA_TOKEN=leak
assert_exit "github_token_blocks" 1 GITHUB_TOKEN=leak
assert_exit "cp_admin_api_token_blocks" 1 CP_ADMIN_API_TOKEN=leak
assert_exit "infisical_operator_blocks" 1 INFISICAL_OPERATOR_TOKEN=leak
assert_exit "railway_token_blocks" 1 RAILWAY_TOKEN=leak
# --- Case 4: MOLECULE_OPERATOR_ prefix family blocks ---
assert_exit "molecule_operator_host_blocks" 1 MOLECULE_OPERATOR_HOST=op.example.com
assert_exit "molecule_operator_ssh_blocks" 1 MOLECULE_OPERATOR_SSH_KEY=ssh-ed25519...
# --- Case 5: adjacent-but-allowed MOLECULE_* names pass ---
assert_exit "molecule_agent_role_passes" 0 MOLECULE_AGENT_ROLE=agent-dev-a
assert_exit "molecule_url_passes" 0 MOLECULE_URL=https://platform.example.com
# --- Case 6: MOLECULE_TENANT_GUARD_DISABLE=1 bypasses the guard ---
assert_exit "disable_flag_bypasses" 0 \
MOLECULE_TENANT_GUARD_DISABLE=1 \
GITEA_TOKEN=leak \
CP_ADMIN_API_TOKEN=leak
echo
echo "=== L2 entrypoint guard: $PASS passed, $FAIL failed ==="
if [[ "$FAIL" -gt 0 ]]; then
exit 1
fi
+30
View File
@@ -131,6 +131,36 @@ def test_message_from_activity_peer_agent():
assert msg.to_dict()["kind"] == "peer_agent"
def test_message_from_activity_delegate_result_distinct_kind():
"""Task #190 / #193 — pushDelegationResultToInbox (RFC #2829 PR-2) writes
rows with method='delegate_result' and source_id=our own workspace UUID
so the caller's wait_for_message can surface delegation completions or
failures. Without an explicit kind override, to_dict() would classify
those rows as kind='peer_agent' (peer_id non-empty) and the agent would
treat its OWN delegation timeout as a peer instructing it — the #190
self-echo bug. Classify these rows as kind='delegation_result' so they
are recognizable as structured delegation outcomes."""
row = {
"id": "act-90",
"source_id": "ws-self-abc", # same as our workspace
"method": "delegate_result",
"summary": "Delegation failed",
"response_body": {"text": "polling timeout", "delegation_id": "d-1"},
"created_at": "2026-05-18T00:00:00Z",
}
msg = inbox.message_from_activity(row)
payload = msg.to_dict()
assert payload["kind"] == "delegation_result", (
f"delegate_result rows must surface as kind='delegation_result', "
f"not peer_agent (got {payload['kind']!r})"
)
# Method preserved for downstream consumers that key off it.
assert payload["method"] == "delegate_result"
# peer_id is still set on the dataclass for back-compat dispatch — the
# distinguishing signal is the kind field.
assert msg.peer_id == "ws-self-abc"
def test_message_from_activity_handles_string_request_body():
row = {
"id": "act-3",