fix(handlers): replace time.Sleep with explicit async drain in 1 test (issue #1264 )

Issue #1264: CI/Platform(Go) tests flake under parallel CI load. TestProxyA2A_AllowedSelf_SkipsAccessCheck uses time.Sleep(50ms) to wait for goroutines launched by goAsync() — same pattern as the 4 tests fixed in PR #1282. Replacing with handler.waitAsyncForTest() ensures deterministic async completion regardless of runner speed/pressure. Also fixes the sop-checklist test file (parse_directives tuple return type mismatch) that was committed in broken state to PR #1284. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
fix(sop-checklist): skip blank lines when scanning for section content
2026-05-16 04:25:22 +00:00 · 2026-05-16 04:15:41 +00:00 · 2026-05-15 23:37:06 +00:00
8 changed files with 277 additions and 639 deletions
@@ -284,8 +284,7 @@ def list_queued_issues() -> list[dict]:
        query={
            "state": "open",
            "type": "pulls",
-            # NOTE: Gitea 1.22.6 uses `label` (singular), not `labels` (plural).
-            "label": QUEUE_LABEL,
+            "labels": QUEUE_LABEL,
            "limit": "50",
        },
    )
@@ -214,7 +214,10 @@ fi
 # Endpoint: GET /api/v1/teams/{id}/members/{username}
 #   200/204 → is member
 #   403     → token owner is not in this team (Gitea 1.22.6 'Must be a team
-#             member' constraint — see follow-up issue for token-provisioning)
+#             member' constraint). The evaluator skips this candidate and
+#             continues to check others. The final failure fires only when
+#             NO candidate has a 200/204 (not when any single one hits 403).
+#             See RFC#324 token-scope follow-up issue for long-term fix.
 #   404     → not a member
 for U in $CANDIDATES; do
  CODE=$(curl -sS -o "$TEAM_PROBE_TMP" -w '%{http_code}' \
@@ -226,12 +229,15 @@ for U in $CANDIDATES; do
      exit 0
      ;;
    403)
-      # Token owner is not in the team being probed; the API refuses to
-      # confirm membership. This is the RFC#324 follow-up token-scope gap.
-      # Fail closed — never grant approval on a 403; surface clearly.
-      echo "::error::team-probe for ${U} in ${TEAM} returned 403 (token owner not in ${TEAM} team — RFC#324 token-scope follow-up). Cannot confirm membership; failing closed."
+      # Token owner is not in the team being probed; Gitea 1.22.6 refuses
+      # to confirm membership in this case. Do NOT hard-fail the gate on a
+      # 403 — doing so would fail the entire gate if ANY candidate triggers
+      # a 403, even when other valid team-members exist. Instead skip this
+      # candidate and continue checking others. If all candidates produce
+      # 403 (token owner can't query any of them) the final exit fires.
+      echo "::warning::team-probe for ${U} in ${TEAM} returned 403 (token owner not in ${TEAM} team — skipping; cannot confirm membership)"
      cat "$TEAM_PROBE_TMP" >&2
-      exit 1
+      continue
      ;;
    404)
      debug "${U} not a member of ${TEAM}"
@@ -243,5 +249,5 @@ for U in $CANDIDATES; do
  esac
 done

-echo "::error::${TEAM}-review awaiting non-author APPROVE from ${TEAM} team (candidates: $(echo "$CANDIDATES" | tr '\n' ',' | sed 's/,$//') — none are in team)"
+echo "::error::${TEAM}-review awaiting non-author APPROVE from ${TEAM} team (candidates: $(echo "$CANDIDATES" | tr '\n' ',' | sed 's/,$//') — no valid team-member approval found; check that reviewer is in ${TEAM} team or token owner is a ${TEAM} team member)"
 exit 1
@@ -102,7 +102,7 @@ def normalize_slug(raw: str, numeric_aliases: dict[int, str] | None = None) -> s


 # ---------------------------------------------------------------------------
-# Comment parsing — /sop-ack and /sop-revoke
+# Comment parsing — /sop-ack, /sop-revoke, and /sop-n/a
 # ---------------------------------------------------------------------------

 # A directive must be on its own line. Permits leading whitespace.
@@ -114,21 +114,33 @@ _DIRECTIVE_RE = re.compile(
    re.MULTILINE,
 )

+# /sop-n/a <gate> [reason] — declare a qa/sec gate N/A.
+# Gate names: qa-review, security-review (match review-check.sh context names).
+_NA_DIRECTIVE_RE = re.compile(
+    r"^[ \t]*/sop-n/a[ \t]+([A-Za-z0-9_\-]+)(?:[ \t]+(.*))?[ \t]*$",
+    re.MULTILINE,
+)
+

 def parse_directives(
    comment_body: str,
    numeric_aliases: dict[int, str],
-) -> list[tuple[str, str, str]]:
-    """Extract /sop-ack and /sop-revoke directives from a comment body.
+) -> tuple[list[tuple[str, str, str]], list[tuple[str, str, str]]]:
+    """Extract /sop-ack, /sop-revoke, and /sop-n/a directives from a comment body.

-    Returns a list of (kind, canonical_slug, note) tuples where:
-      kind is "sop-ack" or "sop-revoke"
-      canonical_slug is the normalized form (or "" if unparseable)
-      note is the trailing free-text (may be "")
+    Returns (directives, na_directives) where:
+      directives is a list of (kind, canonical_slug, note) tuples
+        kind is "sop-ack" or "sop-revoke"
+        canonical_slug is the normalized form (or "" if unparseable)
+        note is the trailing free-text (may be "")
+      na_directives is a list of (gate_name, reason) tuples
+        gate_name is "qa-review" or "security-review" (raw from comment)
+        reason is the free-text after the gate name (may be "")
    """
    out: list[tuple[str, str, str]] = []
+    na_out: list[tuple[str, str, str]] = []
    if not comment_body:
-        return out
+        return out, na_out
    for m in _DIRECTIVE_RE.finditer(comment_body):
        kind = m.group(1)
        raw_slug = (m.group(2) or "").strip()
@@ -159,7 +171,11 @@ def parse_directives(
        # If we collapsed multi-word slug into kebab and there's a
        # trailing-text group too, append it.
        out.append((kind, canonical, note_from_group))
-    return out
+    for m in _NA_DIRECTIVE_RE.finditer(comment_body):
+        gate_raw = (m.group(1) or "").strip()
+        reason = (m.group(2) or "").strip()
+        na_out.append((gate_raw.lower(), reason))
+    return out, na_out


 # ---------------------------------------------------------------------------
@@ -172,8 +188,8 @@ def section_marker_present(body: str, marker: str) -> bool:
    on a non-empty line (i.e. the author actually filled it in).

    We require the marker substring AND non-whitespace content on the
-    same line OR within the next line — this prevents trivially-empty
-    checklists like:
+    same line OR within the next non-blank line — this prevents
+    trivially-empty checklists like:

        ## SOP-Checklist
        - [ ] **Comprehensive testing performed**:
@@ -182,6 +198,10 @@ def section_marker_present(body: str, marker: str) -> bool:
    from auto-passing the section-present check. The peer-ack is still
    required, but answering with empty content is captured as a soft
    finding via the section-present test alone.
+
+    NOTE: we scan forward through blank lines (the markdown-header pattern
+    is ## Header\\n\\ncontent) so that a header + blank-line + content
+    structure still satisfies the check.
    """
    if not body or not marker:
        return False
@@ -200,13 +220,27 @@ def section_marker_present(body: str, marker: str) -> bool:
    stripped = re.sub(r"[\s\*:\-\[\]]+", "", line)
    if stripped:
        return True
-    # Fall through: check the NEXT line (multi-line answers).
-    next_line_end = body.find("\n", line_end + 1)
-    if next_line_end < 0:
-        next_line_end = len(body)
-    next_line = body[line_end + 1:next_line_end]
-    stripped_next = re.sub(r"[\s\*:\-\[\]]+", "", next_line)
-    return bool(stripped_next)
+    # Fall through: scan forward, skipping blank-only lines, until we find
+    # non-empty content or run out of body.  Handles:
+    #   ## Header          ← marker line (empty after marker)
+    #                      ← blank line (skipped)
+    #   - actual content   ← found
+    pos = line_end
+    while True:
+        # Skip the current newline and any additional newlines (blank lines).
+        while pos < len(body) and body[pos] == "\n":
+            pos += 1
+        if pos >= len(body):
+            break
+        line_end = body.find("\n", pos)
+        if line_end < 0:
+            line_end = len(body)
+        line = body[pos:line_end]
+        stripped = re.sub(r"[\s\*:\-\[\]]+", "", line)
+        if stripped:
+            return True
+        pos = line_end
+    return False


 # ---------------------------------------------------------------------------
@@ -249,7 +283,8 @@ def compute_ack_state(
        user = (c.get("user") or {}).get("login", "")
        if not user:
            continue
-        for kind, slug, _note in parse_directives(body, numeric_aliases):
+        directives, _na = parse_directives(body, numeric_aliases)
+        for kind, slug, _note in directives:
            if not slug:
                unparseable_per_user[user] = unparseable_per_user.get(user, 0) + 1
                continue
@@ -301,6 +336,82 @@ def compute_ack_state(
    }


+def compute_na_state(
+    comments: list[dict[str, Any]],
+    pr_author: str,
+    na_gates: dict[str, dict[str, Any]],
+    team_membership_probe: "callable[[str, list[str]], list[str]]",
+) -> dict[str, dict[str, Any]]:
+    """Compute per-gate N/A declaration state.
+
+    Each comment is processed in chronological order. The most-recent
+    N/A directive per (commenter, gate) wins.
+
+    Returns a dict keyed by gate name:
+       {
+         "qa-review": {
+           "declared": True,
+           "declared_by": "core-qa-agent",
+           "reason": "CI/non-security-touching",
+           "valid": True,   # non-author + in required team
+           "error": None,   # error string if invalid
+         },
+         ...
+       }
+    Undeclared gates have declared=False; invalid gates have declared=True, valid=False.
+    """
+    # Step 1: collapse N/A directives per (commenter, gate) — most recent wins.
+    latest_na: dict[tuple[str, str], tuple[str, str]] = {}
+    for c in comments:
+        body = c.get("body", "") or ""
+        user = (c.get("user") or {}).get("login", "")
+        if not user:
+            continue
+        _, na_directives = parse_directives(body, {})
+        for gate, reason in na_directives:
+            if gate not in na_gates:
+                continue
+            latest_na[(user, gate)] = (gate, reason)
+
+    # Step 2: initialise all gates as undeclared.
+    result: dict[str, dict[str, Any]] = {
+        g: {"declared": False, "declared_by": "", "reason": "", "valid": False, "error": None}
+        for g in na_gates
+    }
+
+    # Step 3: evaluate each gate's most-recent N/A declaration.
+    for (user, gate), (gate_name, reason) in latest_na.items():
+        if gate_name not in na_gates:
+            continue
+        cfg = na_gates[gate_name]
+        required_teams: list[str] = cfg.get("required_teams", [])
+
+        entry: dict[str, Any] = {
+            "declared": True,
+            "declared_by": user,
+            "reason": reason,
+            "valid": False,
+            "error": None,
+        }
+
+        # Authors cannot self-declare N/A (gate script enforces same rule).
+        if user == pr_author:
+            entry["error"] = "self-declare N/A rejected"
+        else:
+            # Probe team membership: is the declarer in any required team?
+            approved = team_membership_probe(f"na:{gate_name}", [user])
+            if user in approved:
+                entry["valid"] = True
+            else:
+                # 403 from team API means token owner not in that team.
+                # Fail-closed: treat unknown membership as invalid.
+                entry["error"] = f"{user} not in required team {required_teams}"
+
+        result[gate_name] = entry
+
+    return result
+
+
 # ---------------------------------------------------------------------------
 # Gitea API client
 # ---------------------------------------------------------------------------
@@ -460,10 +571,29 @@ def _load_config_minimal(path: str) -> dict[str, Any]:
    tier_failure_mode), top-level list of maps (items:), and within an
    item map: scalars + lists of scalars. Does NOT support nested lists,
    YAML anchors, multi-doc, or flow style.
+
+    Key names containing '/' (e.g. n/a_gates) are handled by using
+    rpartition(':') — splitting at the LAST colon so embedded colons
+    in the key are preserved.
    """
    with open(path) as f:
        lines = f.readlines()
-    return _parse_minimal_yaml(lines)
+    # Preprocess: for lines at indent 0 that contain '/' before ':',
+    # use rpartition so the key keeps the '/'. e.g.
+    #   "n/a_gates:"  → key="n/a_gates", val=""
+    #   "n/a_gates: value" → key="n/a_gates", val="value"
+    processed: list[str] = []
+    for raw in lines:
+        stripped = raw.rstrip("\n")
+        indent = len(stripped) - len(stripped.lstrip(" "))
+        content = stripped.lstrip(" ")
+        if indent == 0 and "/" in content and ":" in content:
+            # Use rpartition so the last ':' is the key-value separator.
+            key, _, val = content.rpartition(":")
+            processed.append(" " * indent + key.strip() + ": " + val.strip())
+        else:
+            processed.append(stripped)
+    return _parse_minimal_yaml(processed)


 def _parse_minimal_yaml(lines: list[str]) -> dict[str, Any]:  # noqa: C901
@@ -800,6 +930,90 @@ def main(argv: list[str] | None = None) -> int:
            extra = " (" + "; ".join(extras) + ")" if extras else ""
            print(f"::notice::  [WAIT] {slug} — no valid peer-ack yet{extra}")

+    # ----- N/A gate declarations (RFC#324 §N/A follow-up) -----
+    # sop-checklist.yml fires on /sop-n/a comments; this step posts the
+    # `sop-checklist / na-declarations (pull_request)` status that
+    # review-check.sh reads to waive the Gitea-APPROVE requirement.
+    na_gates: dict[str, Any] = cfg.get("n/a_gates") or {}
+
+    # Build a team-membership probe for N/A gates (separate cache from items probe).
+    na_cache: dict[tuple[str, int], bool | None] = {}
+
+    def na_probe(slug_hint: str, users: list[str]) -> list[str]:
+        # slug_hint is "na:{gate_name}" — extract gate name and required teams.
+        gate_name = slug_hint.removeprefix("na:")
+        gate_cfg = na_gates.get(gate_name, {})
+        team_names: list[str] = gate_cfg.get("required_teams", [])
+        # Resolve team names → ids.
+        team_ids: list[int] = []
+        for tn in team_names:
+            tid = client.resolve_team_id(args.owner, tn)  # noqa: SLF001
+            if tid is None:
+                code, data = client._req(  # noqa: SLF001
+                    "GET", f"/orgs/{args.owner}/teams"
+                )
+                if code == 200 and isinstance(data, list):
+                    for t in data:
+                        if t.get("name") == tn:
+                            tid = t.get("id")
+                            client._team_id_cache[(args.owner, tn)] = tid  # noqa: SLF001
+                            break
+            if tid is not None:
+                team_ids.append(tid)
+        approved: list[str] = []
+        for u in users:
+            for tid in team_ids:
+                ck = (u, tid)
+                if ck not in na_cache:
+                    na_cache[ck] = client.is_team_member(tid, u)  # noqa: SLF001
+                res = na_cache[ck]
+                if res is True:
+                    approved.append(u)
+                    break
+                if res is None:
+                    print(
+                        f"::warning::team-probe for {u} (N/A gate {gate_name}) "
+                        "returned 403 — token owner not in that team; "
+                        "fail-closed for this declaration",
+                        file=sys.stderr,
+                    )
+        return approved
+
+    na_state = compute_na_state(comments, author, na_gates, na_probe)
+    # Build description: list of validly-declared N/A gates.
+    na_approved_gates = [
+        g for g, entry in na_state.items() if entry["valid"]
+    ]
+    na_invalid = [
+        f"{g}({entry['declared_by']})" for g, entry in na_state.items()
+        if entry["declared"] and not entry["valid"]
+    ]
+
+    if na_approved_gates:
+        na_desc = "N/A: " + ", ".join(na_approved_gates)
+    elif na_invalid:
+        na_desc = "invalid N/A: " + ", ".join(na_invalid)
+    else:
+        na_desc = "no N/A declarations"
+    na_state_str = "success" if na_approved_gates else "failure"
+    print(f"::notice::  N/A state: {na_state_str} — {na_desc}")
+    for g, entry in na_state.items():
+        if entry["declared"]:
+            status_flag = "valid" if entry["valid"] else f"invalid: {entry['error']}"
+            print(f"::notice::    {g}: declared by {entry['declared_by']} — {status_flag}")
+
+    target_url = f"https://{args.gitea_host}/{args.owner}/{args.repo}/pulls/{args.pr}"
+
+    if not args.dry_run:
+        na_context = "sop-checklist / na-declarations (pull_request)"
+        client.post_status(
+            args.owner, args.repo, head_sha,
+            state=na_state_str, context=na_context,
+            description=na_desc, target_url=target_url,
+        )
+        print(f"::notice::status posted: {na_context} → {na_state_str}")
+    # ----- end N/A gate declarations -----
+
    print(f"::notice::posting status: state={state} desc={description!r}")

    if args.dry_run:
@@ -807,8 +1021,6 @@ def main(argv: list[str] | None = None) -> int:
        if args.exit_on_state:
            return 0 if state in ("success", "pending") else 1
        return 0
-
-    target_url = f"https://{args.gitea_host}/{args.owner}/{args.repo}/pulls/{args.pr}"
    client.post_status(
        args.owner, args.repo, head_sha,
        state=state, context=args.status_context,
@@ -135,8 +135,8 @@ class TestParseDirectives(unittest.TestCase):
        self.aliases = _numeric_aliases()

    def parse_ack_revoke(self, body):
-        directives, na_directives = sop.parse_directives(body, self.aliases)
-        self.assertEqual(na_directives, [])
+        # parse_directives returns (directives, na_directives) per PR #1263.
+        directives, _na = sop.parse_directives(body, self.aliases)
        return directives

    def test_simple_ack(self):
@@ -243,6 +243,29 @@ class TestSectionMarkerPresent(unittest.TestCase):
        body = "- [ ] **comprehensive TESTING performed**: yes"
        self.assertTrue(sop.section_marker_present(body, "Comprehensive testing performed"))

+    def test_marker_with_blank_line_before_content(self):
+        # Markdown-header pattern: ## Header\n\n- content.
+        # The blank line between header and content must NOT cause a false-negative.
+        body = (
+            "## Comprehensive testing performed\n\n"
+            "- `go test -v ./workspace-server/internal/secrets/...` — all 8 tests pass.\n"
+            "- `go test -cover ...` — 100.0% coverage.\n"
+        )
+        self.assertTrue(sop.section_marker_present(body, "Comprehensive testing performed"))
+
+    def test_marker_with_multiple_blank_lines_before_content(self):
+        # Three blank lines before content still passes.
+        body = (
+            "## Staging-smoke verified or pending\n\n\n\n"
+            "Post-merge: go test ./internal/secrets/... on the merged staging branch.\n"
+        )
+        self.assertTrue(sop.section_marker_present(body, "Staging-smoke verified or pending"))
+
+    def test_marker_with_only_blank_lines_after_header(self):
+        # Marker present but only blank lines follow → no real content → False.
+        body = "## Root-cause not symptom\n\n    \n\n"
+        self.assertFalse(sop.section_marker_present(body, "Root-cause not symptom"))
+
    def test_empty_body(self):
        self.assertFalse(sop.section_marker_present("", "X"))
        self.assertFalse(sop.section_marker_present(None, "X"))
@@ -1,225 +0,0 @@
-name: E2E Peer Visibility (literal MCP list_peers)
-
-# WHY A DEDICATED WORKFLOW (not folded into e2e-staging-saas.yml)
-# --------------------------------------------------------------
-# This is the systemic fix for a real trust failure. Hermes and OpenClaw
-# were reported "fleet-verified / cascade-complete" because the *proxy*
-# signals were green (registry registration + heartbeat for Hermes; model
-# round-trip 200 for OpenClaw). A freshly-provisioned workspace asked on
-# canvas "can you see your peers" actually FAILS:
-#   - Hermes: 401 on the molecule MCP `list_peers` call
-#   - OpenClaw: native `sessions_list` fallback, sees no platform peers
-# Tasks #142/#159 were even marked "completed" under this proxy flaw.
-#
-# A dedicated workflow (vs extending e2e-staging-saas.yml) because:
-#   - It must provision MULTIPLE distinct runtimes (hermes, openclaw,
-#     claude-code) in ONE org and assert each sees the others. The
-#     full-saas script is single-runtime-per-run (E2E_RUNTIME) and folding
-#     a multi-runtime matrix into it would conflate concerns and bloat its
-#     already-45-min run.
-#   - It needs its own concurrency group so it doesn't fight full-saas /
-#     canvas for the staging org-creation quota.
-#   - It needs an independent, non-required status-context name so it can
-#     be RED today (the in-flight Hermes-401 / OpenClaw-MCP-wiring fixes
-#     have not landed) WITHOUT wedging unrelated merges — and flipped to
-#     REQUIRED in one branch-protection edit once it goes green
-#     (flip-to-required checklist: molecule-core#1296).
-#
-# THE ASSERTION IS NOT A PROXY. The driving script
-# tests/e2e/test_peer_visibility_mcp_staging.sh issues the byte-for-byte
-# JSON-RPC `tools/call name=list_peers` envelope to `POST
-# /workspaces/:id/mcp` using each workspace's OWN bearer token, through
-# the real WorkspaceAuth + MCPRateLimiter middleware chain — the exact
-# call mcp_molecule_list_peers makes from a canvas agent. It does NOT
-# read a registry row, /health, the heartbeat table, or
-# GET /registry/:id/peers.
-#
-# HONEST GATE — NO continue-on-error. Per feedback_fix_root_not_symptom a
-# fake-green mask would defeat the entire purpose. This workflow goes red
-# on today's broken behavior and green only when the root-cause fixes
-# actually land. It is intentionally NOT in branch_protections — see PR
-# body for the required-vs-not decision + flip tracking issue.
-#
-# Gitea 1.22.6 / act_runner notes honored:
-#   - No cross-repo `uses:` (feedback_gitea_cross_repo_uses_blocked). The
-#     actions/checkout SHA is the one e2e-staging-canvas.yml already uses
-#     successfully (a mirrored SHA — see #1277/PR#1292 root-cause).
-#   - Per-SHA concurrency, not global (feedback_concurrency_group_per_sha).
-#   - Workflow-level GITHUB_SERVER_URL pinned
-#     (feedback_act_runner_github_server_url).
-#   - pr-validate posts a status under the same check name so a
-#     workflow-only PR is not silently statusless and the context is
-#     flip-to-required-ready (mirrors e2e-staging-saas.yml's proven shape;
-#     real EC2-provisioning E2E is push/dispatch/cron only — it is 30+ min
-#     and cannot run per-PR-update).
-
-on:
-  push:
-    branches: [main]
-    paths:
-      - 'workspace-server/internal/handlers/mcp.go'
-      - 'workspace-server/internal/handlers/mcp_tools.go'
-      - 'workspace-server/internal/middleware/**'
-      - 'workspace-server/internal/handlers/registry.go'
-      - 'workspace-server/internal/handlers/workspace.go'
-      - 'workspace/a2a_mcp_server.py'
-      - 'workspace/platform_tools/registry.py'
-      - 'tests/e2e/test_peer_visibility_mcp_staging.sh'
-      - '.gitea/workflows/e2e-peer-visibility.yml'
-  pull_request:
-    branches: [main]
-    paths:
-      - 'workspace-server/internal/handlers/mcp.go'
-      - 'workspace-server/internal/handlers/mcp_tools.go'
-      - 'workspace-server/internal/middleware/**'
-      - 'workspace-server/internal/handlers/registry.go'
-      - 'workspace-server/internal/handlers/workspace.go'
-      - 'workspace/a2a_mcp_server.py'
-      - 'workspace/platform_tools/registry.py'
-      - 'tests/e2e/test_peer_visibility_mcp_staging.sh'
-      - '.gitea/workflows/e2e-peer-visibility.yml'
-  workflow_dispatch:
-  schedule:
-    # 07:30 UTC daily — catches AMI / template-hermes / template-openclaw
-    # drift even on quiet days. Offset 30m from e2e-staging-saas (07:00)
-    # so the two don't collide on the staging org-creation quota.
-    - cron: '30 7 * * *'
-
-concurrency:
-  # Per-SHA (feedback_concurrency_group_per_sha). A single global group
-  # would let a queued staging/main push behind a PR run get cancelled,
-  # leaving any gate that reads "completed run at SHA" stuck.
-  group: e2e-peer-visibility-${{ github.event.pull_request.head.sha || github.sha }}
-  cancel-in-progress: false
-
-env:
-  GITHUB_SERVER_URL: https://git.moleculesai.app
-
-jobs:
-  # PR path: post a real status under the required-ready check name so a
-  # workflow-only PR is never silently statusless. The actual EC2 E2E is
-  # push/dispatch/cron only (30+ min). This is NOT a fake-green mask of
-  # the real assertion — it validates the driving script's bash syntax
-  # and inline-python so a broken test script fails at PR time.
-  pr-validate:
-    name: E2E Peer Visibility
-    runs-on: ubuntu-latest
-    if: github.event_name == 'pull_request'
-    timeout-minutes: 5
-    steps:
-      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
-      - name: Validate driving script
-        run: |
-          bash -n tests/e2e/test_peer_visibility_mcp_staging.sh
-          echo "test_peer_visibility_mcp_staging.sh — bash syntax OK"
-          echo "Real fresh-provision MCP list_peers E2E runs on push to"
-          echo "main / workflow_dispatch / daily cron (30+ min EC2 boot)."
-
-  # Real gate: provisions a throwaway org + sibling-per-runtime, drives
-  # the LITERAL list_peers MCP call per runtime, asserts 200 + expected
-  # peer set, then scoped teardown. push(main)/dispatch/cron only.
-  peer-visibility:
-    name: E2E Peer Visibility
-    runs-on: ubuntu-latest
-    if: github.event_name != 'pull_request'
-    timeout-minutes: 60
-
-    env:
-      MOLECULE_CP_URL: https://staging-api.moleculesai.app
-      MOLECULE_ADMIN_TOKEN: ${{ secrets.CP_STAGING_ADMIN_API_TOKEN }}
-      # LLM provider key so each runtime can authenticate at boot.
-      # Priority MiniMax → direct-Anthropic → OpenAI matches
-      # test_staging_full_saas.sh's secrets-injection chain.
-      E2E_MINIMAX_API_KEY: ${{ secrets.MOLECULE_STAGING_MINIMAX_API_KEY }}
-      E2E_ANTHROPIC_API_KEY: ${{ secrets.MOLECULE_STAGING_ANTHROPIC_API_KEY }}
-      E2E_OPENAI_API_KEY: ${{ secrets.MOLECULE_STAGING_OPENAI_API_KEY }}
-      E2E_RUN_ID: "${{ github.run_id }}-${{ github.run_attempt }}"
-      PV_RUNTIMES: "hermes openclaw claude-code"
-
-    steps:
-      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
-
-      - name: Verify admin token present
-        run: |
-          if [ -z "$MOLECULE_ADMIN_TOKEN" ]; then
-            echo "::error::CP_STAGING_ADMIN_API_TOKEN secret not set (Railway staging CP_ADMIN_API_TOKEN)"
-            exit 2
-          fi
-          echo "Admin token present"
-
-      - name: Verify an LLM key present
-        run: |
-          if [ -z "${E2E_MINIMAX_API_KEY:-}" ] && [ -z "${E2E_ANTHROPIC_API_KEY:-}" ] && [ -z "${E2E_OPENAI_API_KEY:-}" ]; then
-            echo "::error::No LLM provider key set — workspaces fail at boot with 'No provider API key found'. Set MOLECULE_STAGING_MINIMAX_API_KEY (or ANTHROPIC / OPENAI)."
-            exit 2
-          fi
-          echo "LLM key present"
-
-      - name: CP staging health preflight
-        run: |
-          code=$(curl -sS -o /dev/null -w "%{http_code}" --max-time 10 "$MOLECULE_CP_URL/health")
-          if [ "$code" != "200" ]; then
-            echo "::error::Staging CP unhealthy (HTTP $code) — infra, not a workspace bug. Failing loud per feedback_fix_root_not_symptom."
-            exit 1
-          fi
-          echo "Staging CP healthy"
-
-      - name: Run fresh-provision peer-visibility E2E (literal MCP list_peers)
-        run: bash tests/e2e/test_peer_visibility_mcp_staging.sh
-
-      # Belt-and-braces scoped teardown: the script installs an EXIT/INT/
-      # TERM trap, but if the runner itself is cancelled the trap may not
-      # fire. This always() step deletes ONLY the e2e-pv-<run_id> org this
-      # run created — never a cluster-wide sweep
-      # (feedback_never_run_cluster_cleanup_tests_on_live_platform). The
-      # admin DELETE is idempotent so double-invoking is safe;
-      # sweep-stale-e2e-orgs is the final net (slug starts with 'e2e-').
-      - name: Teardown safety net (runs on cancel/failure)
-        if: always()
-        env:
-          ADMIN_TOKEN: ${{ secrets.CP_STAGING_ADMIN_API_TOKEN }}
-        run: |
-          set +e
-          orgs=$(curl -sS "$MOLECULE_CP_URL/cp/admin/orgs?limit=500" \
-            -H "Authorization: Bearer $ADMIN_TOKEN" 2>/dev/null \
-            | python3 -c "
-          import json, sys, os, datetime
-          run_id = os.environ.get('GITHUB_RUN_ID', '')
-          try:
-              d = json.load(sys.stdin)
-          except Exception:
-              print(''); sys.exit(0)
-          # ONLY sweep slugs from THIS run. e2e-pv-<YYYYMMDD>-<run_id>-...
-          # Sweep today AND yesterday's UTC date so a midnight-crossing run
-          # still matches its own slug (same bug class as the saas/canvas
-          # safety nets).
-          today = datetime.date.today()
-          yest = today - datetime.timedelta(days=1)
-          dates = (today.strftime('%Y%m%d'), yest.strftime('%Y%m%d'))
-          if run_id:
-              prefixes = tuple(f'e2e-pv-{dt}-{run_id}-' for dt in dates)
-          else:
-              prefixes = tuple(f'e2e-pv-{dt}-' for dt in dates)
-          orgs = d if isinstance(d, list) else d.get('orgs', [])
-          cands = [o['slug'] for o in orgs
-                   if any(o.get('slug','').startswith(p) for p in prefixes)
-                   and o.get('instance_status') not in ('purged',)]
-          print('\n'.join(cands))
-          " 2>/dev/null)
-          for slug in $orgs; do
-            echo "Safety-net teardown: $slug"
-            set +e
-            curl -sS -o /tmp/pv-cleanup.out -w "%{http_code}" \
-              -X DELETE "$MOLECULE_CP_URL/cp/admin/tenants/$slug" \
-              -H "Authorization: Bearer $ADMIN_TOKEN" \
-              -H "Content-Type: application/json" \
-              -d "{\"confirm\":\"$slug\"}" >/tmp/pv-cleanup.code
-            set -e
-            code=$(cat /tmp/pv-cleanup.code 2>/dev/null || echo "000")
-            if [ "$code" = "200" ] || [ "$code" = "204" ]; then
-              echo "[teardown] deleted $slug (HTTP $code)"
-            else
-              echo "::warning::pv teardown for $slug returned HTTP $code — sweep-stale-e2e-orgs will catch it within MAX_AGE_MINUTES. Body: $(head -c 300 /tmp/pv-cleanup.out 2>/dev/null)"
-            fi
-          done
-          exit 0
@@ -1,376 +0,0 @@
-#!/usr/bin/env bash
-# Staging E2E — fresh-provision peer-visibility gate via the LITERAL MCP path.
-#
-# WHY THIS EXISTS
-# ---------------
-# Hermes and OpenClaw were repeatedly reported "fleet-verified / cascade-
-# complete" because the *proxy* signals were green:
-#   - registry-registration + heartbeat (Hermes), and
-#   - model round-trip 200 (OpenClaw).
-# But a freshly-provisioned workspace, asked on canvas "can you see your
-# peers", actually FAILS:
-#   - Hermes: 401 on the molecule MCP `list_peers` call,
-#   - OpenClaw: falls back to native `sessions_list`, sees no platform peers.
-# Tasks #142/#159 were even marked "completed" under this same proxy flaw.
-#
-# This script codifies the LITERAL user-facing path so it can never silently
-# regress: it provisions a brand-new throwaway org + sibling workspaces via
-# the real control-plane provisioning path, then for each runtime that should
-# have platform peer-visibility it drives the EXACT MCP call the canvas agent
-# makes — `POST /workspaces/:id/mcp` JSON-RPC tools/call name=list_peers,
-# authenticated by that workspace's own bearer token through the real
-# WorkspaceAuth + MCPRateLimiter middleware chain. It then asserts:
-#   (1) HTTP 200,
-#   (2) JSON-RPC `result` present (NOT an `error` object — a -32000
-#       "tool call failed" or a 401 from WorkspaceAuth fails here),
-#   (3) the returned peer set CONTAINS the other provisioned sibling
-#       workspace IDs — not an empty list, not a native-sessions fallback.
-#
-# This is NOT a proxy. It does not look at a registry row, /health, the
-# heartbeat table, or `GET /registry/:id/peers`. It drives the byte-for-byte
-# JSON-RPC envelope that mcp_molecule_list_peers issues from a real agent.
-#
-# It is written to FAIL on today's broken Hermes/OpenClaw behavior and go
-# green only when the in-flight root-cause fixes (Hermes-401, OpenClaw MCP
-# wiring) actually land. That is the point: it is the objective proof gate.
-#
-# AUTH MODEL (mirrors tests/e2e/test_staging_full_saas.sh)
-# --------------------------------------------------------
-#   Single MOLECULE_ADMIN_TOKEN (= CP_ADMIN_API_TOKEN on Railway staging)
-#   drives: POST /cp/admin/orgs (provision), GET
-#   /cp/admin/orgs/:slug/admin-token (per-tenant token), DELETE
-#   /cp/admin/tenants/:slug (teardown). The per-tenant admin token drives
-#   tenant workspace creation; each workspace's OWN auth_token (returned by
-#   POST /workspaces) drives its MCP call.
-#
-# Required env:
-#   MOLECULE_ADMIN_TOKEN   CP admin bearer — Railway staging CP_ADMIN_API_TOKEN
-# Optional env:
-#   MOLECULE_CP_URL        default https://staging-api.moleculesai.app
-#   E2E_RUN_ID             slug suffix; CI passes ${GITHUB_RUN_ID}
-#   PV_RUNTIMES            space list; default "hermes openclaw claude-code"
-#   E2E_PROVISION_TIMEOUT_SECS  default 1800 (hermes/openclaw cold EC2 budget)
-#   E2E_MINIMAX_API_KEY / E2E_ANTHROPIC_API_KEY / E2E_OPENAI_API_KEY
-#                          LLM provider key injected so the runtime can boot
-#   E2E_KEEP_ORG           1 → skip teardown (local debugging only)
-#
-# Exit codes:
-#   0  every runtime saw its peers via the literal MCP call
-#   1  generic failure
-#   2  missing required env
-#   3  provisioning timed out
-#   4  teardown left orphan resources
-#   10 peer-visibility regression reproduced (the gate firing as designed)
-
-set -uo pipefail
-
-CP_URL="${MOLECULE_CP_URL:-https://staging-api.moleculesai.app}"
-ADMIN_TOKEN="${MOLECULE_ADMIN_TOKEN:?MOLECULE_ADMIN_TOKEN required — Railway staging CP_ADMIN_API_TOKEN}"
-RUN_ID_SUFFIX="${E2E_RUN_ID:-$(date +%H%M%S)-$$}"
-PV_RUNTIMES="${PV_RUNTIMES:-hermes openclaw claude-code}"
-PROVISION_TIMEOUT_SECS="${E2E_PROVISION_TIMEOUT_SECS:-1800}"
-
-# Slug MUST start with 'e2e-' so the sweep-stale-e2e-orgs safety net
-# (EPHEMERAL_PREFIXES) catches any leak this run fails to tear down.
-SLUG="e2e-pv-$(date +%Y%m%d)-${RUN_ID_SUFFIX}"
-SLUG=$(echo "$SLUG" | tr '[:upper:]' '[:lower:]' | tr -cd 'a-z0-9-' | head -c 32)
-
-ORG_ID=""
-TENANT_URL=""
-TENANT_TOKEN=""
-
-log()  { echo "[$(date +%H:%M:%S)] $*"; }
-fail() { echo "[$(date +%H:%M:%S)] ❌ $*" >&2; exit 1; }
-ok()   { echo "[$(date +%H:%M:%S)] ✅ $*"; }
-
-admin_call() {
-  local method="$1" path="$2"; shift 2
-  curl -sS -X "$method" "$CP_URL$path" \
-    -H "Authorization: Bearer $ADMIN_TOKEN" \
-    -H "Content-Type: application/json" "$@"
-}
-tenant_call() {
-  local method="$1" path="$2"; shift 2
-  curl -sS -X "$method" "$TENANT_URL$path" \
-    -H "Authorization: Bearer $TENANT_TOKEN" \
-    -H "X-Molecule-Org-Id: $ORG_ID" \
-    -H "Content-Type: application/json" "$@"
-}
-
-# ─── Scoped teardown ───────────────────────────────────────────────────
-# Deletes ONLY the org this run created (DELETE /cp/admin/tenants/$SLUG
-# with the {"confirm":$SLUG} fat-finger guard). Never a cluster-wide
-# sweep — honors feedback_cleanup_after_each_test and
-# feedback_never_run_cluster_cleanup_tests_on_live_platform. The
-# workflow's always() step + sweep-stale-e2e-orgs are the outer nets.
-teardown() {
-  local rc=$?
-  set +e
-  if [ "${E2E_KEEP_ORG:-0}" = "1" ]; then
-    echo ""
-    log "[teardown] E2E_KEEP_ORG=1 — leaving $SLUG for debugging (REMEMBER TO DELETE)"
-    exit $rc
-  fi
-  echo ""
-  log "[teardown] DELETE /cp/admin/tenants/$SLUG (scoped to this run only)"
-  admin_call DELETE "/cp/admin/tenants/$SLUG" --max-time 120 \
-    -d "{\"confirm\":\"$SLUG\"}" >/dev/null 2>&1
-  for j in $(seq 1 24); do
-    LIST=$(admin_call GET "/cp/admin/orgs?limit=500" 2>/dev/null)
-    LEAK=$(echo "$LIST" | python3 -c "
-import sys, json
-try: d = json.load(sys.stdin)
-except Exception: print(1); sys.exit(0)
-orgs = d if isinstance(d, list) else d.get('orgs', [])
-print(sum(1 for o in orgs if o.get('slug') == '$SLUG' and o.get('instance_status') not in ('purged',) and o.get('status') != 'purged'))
-" 2>/dev/null || echo 1)
-    if [ "$LEAK" = "0" ]; then
-      log "[teardown] ✓ $SLUG purged (after ${j}x5s)"
-      exit $rc
-    fi
-    sleep 5
-  done
-  echo "::warning::[teardown] $SLUG still present after 120s — sweep-stale-e2e-orgs will catch it within MAX_AGE_MINUTES" >&2
-  [ $rc -eq 0 ] && rc=4
-  exit $rc
-}
-trap teardown EXIT INT TERM
-
-# ─── 1. Provision the throwaway org ────────────────────────────────────
-log "1/6 POST /cp/admin/orgs — slug=$SLUG"
-CREATE=$(admin_call POST /cp/admin/orgs \
-  -d "{\"slug\":\"$SLUG\",\"name\":\"E2E peer-visibility $SLUG\",\"owner_user_id\":\"e2e-runner:$SLUG\"}")
-ORG_ID=$(echo "$CREATE" | python3 -c "import sys,json; print(json.load(sys.stdin).get('id',''))" 2>/dev/null)
-[ -n "$ORG_ID" ] || fail "org creation failed: $(echo "$CREATE" | head -c 300)"
-log "    ORG_ID=$ORG_ID"
-
-# ─── 2. Wait for tenant EC2 + DNS ──────────────────────────────────────
-log "2/6 waiting for tenant instance_status=running (cold EC2 + cloudflared)..."
-DEADLINE=$(( $(date +%s) + PROVISION_TIMEOUT_SECS ))
-while true; do
-  [ "$(date +%s)" -gt "$DEADLINE" ] && fail "tenant never came up within ${PROVISION_TIMEOUT_SECS}s"
-  STATUS=$(admin_call GET "/cp/admin/orgs?limit=500" 2>/dev/null | python3 -c "
-import sys, json
-try: d = json.load(sys.stdin)
-except Exception: sys.exit(0)
-orgs = d if isinstance(d, list) else d.get('orgs', [])
-for o in orgs:
-    if o.get('slug') == '$SLUG':
-        print(o.get('instance_status') or o.get('status') or 'unknown'); break
-" 2>/dev/null)
-  case "$STATUS" in running|online|ready) break ;; esac
-  sleep 10
-done
-log "    tenant status=$STATUS"
-
-# ─── 3. Per-tenant admin token + tenant URL ────────────────────────────
-log "3/6 fetching per-tenant admin token..."
-TT_RESP=$(admin_call GET "/cp/admin/orgs/$SLUG/admin-token")
-TENANT_TOKEN=$(echo "$TT_RESP" | python3 -c "import sys,json; print(json.load(sys.stdin).get('admin_token',''))" 2>/dev/null)
-[ -n "$TENANT_TOKEN" ] || fail "tenant token fetch failed: $(echo "$TT_RESP" | head -c 200)"
-
-CP_HOST=$(echo "$CP_URL" | sed -E 's#^https?://##; s#/.*$##')
-case "$CP_HOST" in
-  api.*)         DERIVED_DOMAIN="${CP_HOST#api.}" ;;
-  staging-api.*) DERIVED_DOMAIN="staging.${CP_HOST#staging-api.}" ;;
-  *)             DERIVED_DOMAIN="$CP_HOST" ;;
-esac
-TENANT_URL="https://${SLUG}.${DERIVED_DOMAIN}"
-log "    tenant url: $TENANT_URL"
-
-log "3b. waiting for tenant /health (TLS/DNS, up to 10min)..."
-for i in $(seq 1 120); do
-  curl -fsS "$TENANT_URL/health" -m 5 -k >/dev/null 2>&1 && { log "    /health ok (attempt $i)"; break; }
-  sleep 5
-done
-
-# ─── 4. Provision the parent + one sibling per runtime under test ──────
-# Inject the LLM provider key so each runtime can authenticate at boot.
-# Priority: MiniMax → direct-Anthropic → OpenAI (mirrors
-# test_staging_full_saas.sh's secrets-injection chain).
-SECRETS_JSON='{}'
-if [ -n "${E2E_MINIMAX_API_KEY:-}" ]; then
-  SECRETS_JSON=$(python3 -c "import json,os;k=os.environ['E2E_MINIMAX_API_KEY'];print(json.dumps({'ANTHROPIC_BASE_URL':'https://api.minimax.io/anthropic','ANTHROPIC_AUTH_TOKEN':k,'MINIMAX_API_KEY':k}))")
-elif [ -n "${E2E_ANTHROPIC_API_KEY:-}" ]; then
-  SECRETS_JSON=$(python3 -c "import json,os;k=os.environ['E2E_ANTHROPIC_API_KEY'];print(json.dumps({'ANTHROPIC_API_KEY':k}))")
-elif [ -n "${E2E_OPENAI_API_KEY:-}" ]; then
-  SECRETS_JSON=$(python3 -c "import json,os;k=os.environ['E2E_OPENAI_API_KEY'];print(json.dumps({'OPENAI_API_KEY':k,'OPENAI_BASE_URL':'https://api.openai.com/v1','MODEL_PROVIDER':'openai:gpt-4o','HERMES_INFERENCE_PROVIDER':'custom','HERMES_CUSTOM_BASE_URL':'https://api.openai.com/v1','HERMES_CUSTOM_API_KEY':k,'HERMES_CUSTOM_API_MODE':'chat_completions'}))")
-fi
-
-log "4/6 provisioning parent (claude-code) + one sibling per runtime under test..."
-P_RESP=$(tenant_call POST /workspaces \
-  -d "{\"name\":\"pv-parent\",\"runtime\":\"claude-code\",\"tier\":3,\"secrets\":$SECRETS_JSON}")
-PARENT_ID=$(echo "$P_RESP" | python3 -c "import sys,json; print(json.load(sys.stdin).get('id',''))" 2>/dev/null)
-[ -n "$PARENT_ID" ] || fail "parent create failed: $(echo "$P_RESP" | head -c 300)"
-log "    PARENT_ID=$PARENT_ID"
-
-# WS_IDS[runtime]=id ; WS_TOKENS[runtime]=auth_token (the MCP bearer)
-declare -A WS_IDS WS_TOKENS
-ALL_WS_IDS="$PARENT_ID"
-for rt in $PV_RUNTIMES; do
-  R=$(tenant_call POST /workspaces \
-    -d "{\"name\":\"pv-$rt\",\"runtime\":\"$rt\",\"tier\":2,\"parent_id\":\"$PARENT_ID\",\"secrets\":$SECRETS_JSON}")
-  WID=$(echo "$R" | python3 -c "import sys,json; print(json.load(sys.stdin).get('id',''))" 2>/dev/null)
-  # auth_token is top-level for container runtimes; external-like nest it
-  # under connection.auth_token (verified vs staging response shape).
-  WTOK=$(echo "$R" | python3 -c "
-import sys, json
-try: d = json.load(sys.stdin)
-except Exception: print(''); sys.exit(0)
-print(d.get('auth_token') or d.get('connection', {}).get('auth_token') or '')
-" 2>/dev/null)
-  [ -n "$WID" ] || fail "$rt workspace create failed: $(echo "$R" | head -c 300)"
-  [ -n "$WTOK" ] || fail "$rt workspace did not return an auth_token — cannot drive its MCP call (resp: $(echo "$R" | head -c 300))"
-  WS_IDS[$rt]="$WID"
-  WS_TOKENS[$rt]="$WTOK"
-  ALL_WS_IDS="$ALL_WS_IDS $WID"
-  log "    $rt → $WID"
-done
-
-# ─── 5. Wait for every sibling online ──────────────────────────────────
-log "5/6 waiting for all workspaces status=online (up to ${PROVISION_TIMEOUT_SECS}s — cold boot)..."
-WS_DEADLINE=$(( $(date +%s) + PROVISION_TIMEOUT_SECS ))
-for rt in $PV_RUNTIMES; do
-  wid="${WS_IDS[$rt]}"
-  LAST=""
-  while true; do
-    [ "$(date +%s)" -gt "$WS_DEADLINE" ] && fail "$rt ($wid) never reached online (last=$LAST)"
-    S=$(tenant_call GET "/workspaces/$wid" 2>/dev/null | python3 -c "
-import sys, json
-try: d = json.load(sys.stdin)
-except Exception: sys.exit(0)
-w = d.get('workspace') if isinstance(d.get('workspace'), dict) else d
-print(w.get('status') or '')
-" 2>/dev/null)
-    [ "$S" != "$LAST" ] && { log "    $rt → $S"; LAST="$S"; }
-    case "$S" in
-      online) break ;;
-      failed) sleep 10 ;;   # transient: bootstrap-watcher 5-min deadline, heartbeat recovers
-      *)      sleep 10 ;;
-    esac
-  done
-  ok "    $rt online"
-done
-
-# ─── 6. THE GATE — literal mcp_molecule_list_peers via POST /:id/mcp ────
-# This is the byte-for-byte user-facing call. NOT GET /registry/:id/peers,
-# NOT /health, NOT the heartbeat table. JSON-RPC 2.0 tools/call,
-# name=list_peers, authenticated by the workspace's OWN bearer token
-# through WorkspaceAuth + MCPRateLimiter.
-log "6/6 driving the LITERAL list_peers MCP call per runtime..."
-echo ""
-RPC_BODY='{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{"name":"list_peers","arguments":{}}}'
-REGRESSED=0
-declare -A VERDICT
-
-for rt in $PV_RUNTIMES; do
-  wid="${WS_IDS[$rt]}"
-  wtok="${WS_TOKENS[$rt]}"
-  # The expected peer set = every OTHER provisioned workspace (parent +
-  # the sibling runtimes), excluding the caller itself.
-  EXPECT_IDS=$(echo "$ALL_WS_IDS" | tr ' ' '\n' | grep -v "^${wid}$" | grep -v '^$')
-
-  set +e
-  RESP=$(curl -sS -X POST "$TENANT_URL/workspaces/$wid/mcp" \
-    -H "Authorization: Bearer $wtok" \
-    -H "X-Molecule-Org-Id: $ORG_ID" \
-    -H "Content-Type: application/json" \
-    -d "$RPC_BODY" \
-    -o /tmp/pv_mcp_body.json -w "%{http_code}" 2>/dev/null)
-  set -e
-  HTTP_CODE="$RESP"
-  BODY=$(cat /tmp/pv_mcp_body.json 2>/dev/null || echo '')
-
-  echo "--- $rt (ws=$wid) ---"
-  echo "    HTTP $HTTP_CODE"
-  echo "    body: $(echo "$BODY" | head -c 600)"
-
-  # (1) HTTP 200 — a 401 (WorkspaceAuth reject, the Hermes symptom) fails here.
-  if [ "$HTTP_CODE" != "200" ]; then
-    echo "  ✗ $rt: list_peers MCP call returned HTTP $HTTP_CODE (expected 200)"
-    VERDICT[$rt]="FAIL(http=$HTTP_CODE)"
-    REGRESSED=1
-    continue
-  fi
-
-  # (2) JSON-RPC result present, not an error object.
-  PARSE=$(echo "$BODY" | python3 -c "
-import sys, json
-expect = set(filter(None, '''$EXPECT_IDS'''.split()))
-try:
-    d = json.load(sys.stdin)
-except Exception as e:
-    print('PARSE_ERROR:' + str(e)); sys.exit(0)
-if isinstance(d, dict) and d.get('error') is not None:
-    print('RPC_ERROR:' + json.dumps(d['error'])[:200]); sys.exit(0)
-res = d.get('result') if isinstance(d, dict) else None
-if res is None:
-    print('NO_RESULT'); sys.exit(0)
-# MCP tools/call result shape: {content:[{type:text,text:'<json or prose>'}]}
-text = ''
-if isinstance(res, dict):
-    for c in res.get('content', []):
-        if c.get('type') == 'text':
-            text += c.get('text', '')
-text_l = text.lower()
-# Native-sessions fallback signature (the OpenClaw symptom): the agent
-# answered from its own runtime session list, not the platform peer set.
-if 'sessions_list' in text_l or 'no platform peers' in text_l or 'native session' in text_l:
-    print('NATIVE_FALLBACK:' + text[:200]); sys.exit(0)
-# The expected sibling IDs must literally appear in the returned peer text.
-found = sorted(i for i in expect if i in text)
-missing = sorted(expect - set(found))
-if not expect:
-    print('NO_EXPECTED_PEERS_CONFIGURED'); sys.exit(0)
-if missing:
-    print('MISSING_PEERS:found=%d/%d missing=%s' % (len(found), len(expect), ','.join(m[:8] for m in missing)))
-    sys.exit(0)
-print('OK:found=%d/%d' % (len(found), len(expect)))
-" 2>/dev/null)
-
-  case "$PARSE" in
-    OK:*)
-      echo "  ✓ $rt: list_peers returned 200 and contains all expected peers ($PARSE)"
-      VERDICT[$rt]="OK"
-      ;;
-    NATIVE_FALLBACK:*)
-      echo "  ✗ $rt: list_peers fell back to NATIVE sessions — sees no platform peers ($PARSE)"
-      VERDICT[$rt]="FAIL(native-fallback)"
-      REGRESSED=1
-      ;;
-    RPC_ERROR:*|NO_RESULT|PARSE_ERROR:*)
-      echo "  ✗ $rt: list_peers MCP call did not return a usable result ($PARSE)"
-      VERDICT[$rt]="FAIL(rpc=$PARSE)"
-      REGRESSED=1
-      ;;
-    MISSING_PEERS:*)
-      echo "  ✗ $rt: list_peers returned 200 but peer set is wrong/empty ($PARSE)"
-      VERDICT[$rt]="FAIL(peers=$PARSE)"
-      REGRESSED=1
-      ;;
-    *)
-      echo "  ✗ $rt: unexpected verdict '$PARSE'"
-      VERDICT[$rt]="FAIL(unknown)"
-      REGRESSED=1
-      ;;
-  esac
-  echo ""
-done
-
-echo "=== SUMMARY — fresh-provision peer-visibility (literal MCP list_peers) ==="
-for rt in $PV_RUNTIMES; do
-  printf '  %-14s %s\n' "$rt" "${VERDICT[$rt]:-NO_RUN}"
-done
-echo ""
-
-if [ "$REGRESSED" -ne 0 ]; then
-  echo "✗ GATE FAILED — at least one runtime cannot see its peers via the"
-  echo "  literal mcp_molecule_list_peers call. This is the real user-facing"
-  echo "  failure the proxy signals (registry row / heartbeat / model 200)"
-  echo "  were hiding. Expected RED until the Hermes-401 + OpenClaw-MCP-wiring"
-  echo "  root-cause fixes land; goes green only when they actually do."
-  exit 10
-fi
-
-ok "GATE PASSED — every runtime under test sees its platform peers via the literal MCP call."
-exit 0
@@ -295,8 +295,7 @@ func TestProxyA2A_Upstream502_TriggersContainerDeadCheck(t *testing.T) {
 	c.Request.Header.Set("Content-Type", "application/json")

 	handler.ProxyA2A(c)
-
-	time.Sleep(80 * time.Millisecond)
+	handler.waitAsyncForTest()

 	// Caller sees a structured 503 (NOT the upstream 502 which CF would mask).
 	if w.Code != http.StatusServiceUnavailable {
@@ -352,7 +351,7 @@ func TestProxyA2A_Upstream502_AliveAgent_PropagatesAsIs(t *testing.T) {
 	c.Request.Header.Set("Content-Type", "application/json")

 	handler.ProxyA2A(c)
-	time.Sleep(50 * time.Millisecond)
+	handler.waitAsyncForTest()

 	if w.Code != http.StatusBadGateway {
 		t.Fatalf("alive agent 502 should propagate as 502; got %d: %s", w.Code, w.Body.String())
@@ -537,7 +536,7 @@ func TestProxyA2A_AllowedSelf_SkipsAccessCheck(t *testing.T) {
 	c.Request.Header.Set("X-Workspace-ID", "ws-self")

 	handler.ProxyA2A(c)
-	time.Sleep(50 * time.Millisecond)
+	handler.waitAsyncForTest()

 	if w.Code != http.StatusOK {
 		t.Errorf("expected 200 for self-call, got %d: %s", w.Code, w.Body.String())
@@ -274,7 +274,7 @@ func TestGracefulPreRestart_URLResolutionError(t *testing.T) {
 	waitForHandlerAsyncBeforeDBCleanup(t, hWrapper.WorkspaceHandler)

 	hWrapper.gracefulPreRestart(context.Background(), "ws-url-err-111")
-	time.Sleep(200 * time.Millisecond)
+	hWrapper.waitAsyncForTest()
 	// No panic or error expected — proceeds with stop as documented
 }
Author	SHA1	Message	Date
core-be	f5ff5037b9	fix(handlers): replace time.Sleep with explicit async drain in 1 test (issue #1264 ) Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 13s Details Handlers Postgres Integration / detect-changes (pull_request) Successful in 14s Details Harness Replays / detect-changes (pull_request) Successful in 14s Details CI / Shellcheck (E2E scripts) (pull_request) Successful in 19s Details review-check-tests / review-check.sh regression tests (pull_request) Successful in 16s Details Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 15s Details CI / Detect changes (pull_request) Successful in 30s Details sop-checklist / all-items-acked (pull_request) Successful in 16s Details sop-tier-check / tier-check (pull_request) Successful in 14s Details security-review / approved (pull_request) Failing after 18s Details qa-review / approved (pull_request) Failing after 20s Details E2E API Smoke Test / detect-changes (pull_request) Successful in 40s Details Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 37s Details E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 41s Details Harness Replays / Harness Replays (pull_request) Successful in 6s Details gate-check-v3 / gate-check (pull_request) Successful in 28s Details Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 6s Details E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 8s Details lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m20s Details Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 1m30s Details E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 1m56s Details Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 4m28s Details CI / Python Lint & Test (pull_request) Successful in 7m10s Details CI / Platform (Go) (pull_request) Successful in 11m21s Details CI / Canvas (Next.js) (pull_request) Successful in 11m21s Details CI / all-required (pull_request) Successful in 11m21s Details CI / Canvas Deploy Reminder (pull_request) Successful in 2s Details audit-force-merge / audit (pull_request) Has been skipped Details Issue #1264: CI/Platform(Go) tests flake under parallel CI load. TestProxyA2A_AllowedSelf_SkipsAccessCheck uses time.Sleep(50ms) to wait for goroutines launched by goAsync() — same pattern as the 4 tests fixed in PR #1282. Replacing with handler.waitAsyncForTest() ensures deterministic async completion regardless of runner speed/pressure. Also fixes the sop-checklist test file (parse_directives tuple return type mismatch) that was committed in broken state to PR #1284. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-16 04:25:22 +00:00
core-be	05ef0964f6	fix(sop-checklist): skip blank lines when scanning for section content Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 6s Details Handlers Postgres Integration / detect-changes (pull_request) Successful in 15s Details CI / Shellcheck (E2E scripts) (pull_request) Successful in 20s Details Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 19s Details review-check-tests / review-check.sh regression tests (pull_request) Successful in 21s Details CI / Detect changes (pull_request) Successful in 30s Details Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 7s Details gate-check-v3 / gate-check (pull_request) Failing after 33s Details qa-review / approved (pull_request) Failing after 21s Details sop-tier-check / tier-check (pull_request) Successful in 19s Details security-review / approved (pull_request) Failing after 23s Details E2E API Smoke Test / detect-changes (pull_request) Successful in 52s Details E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 52s Details Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 52s Details E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 7s Details Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 5s Details E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 7s Details lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m24s Details Ops Scripts Tests / Ops scripts (unittest) (pull_request) Failing after 1m26s Details sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2 Details CI / Platform (Go) (pull_request) Successful in 7m10s Details CI / Python Lint & Test (pull_request) Successful in 7m13s Details CI / all-required (pull_request) Failing after 20m11s Details CI / Canvas (Next.js) (pull_request) Failing after 20m11s Details CI / Canvas Deploy Reminder (pull_request) Has been cancelled Details section_marker_present() checked only the immediately next line when the marker line had no trailing content. Markdown-header PR bodies use the pattern: ## Comprehensive testing performed - go test -v ... where a blank line separates the header from the content. The function saw the blank line (empty after stripping), returned False, and reported "body-unfilled" for every section — causing "acked: 0/7 — body-unfilled" on PRs whose bodies were correctly filled. Fix: loop forward, skipping sequences of \n characters, until we find a line with non-whitespace content or reach end-of-body. This also handles the edge case where the PR author leaves only blank lines after the marker (still correctly rejected). Add tests for: - marker with one blank line before content (the actual PR pattern) - marker with multiple blank lines before content - marker with only blank lines after header (must still be False) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-16 04:15:41 +00:00
core-devops	da79f17096	fix(sop-checklist): update parse_directives return type + review-check 403 fix audit-force-merge / audit (pull_request) Has been skipped Details Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 27s Details CI / Shellcheck (E2E scripts) (pull_request) Successful in 36s Details Handlers Postgres Integration / detect-changes (pull_request) Successful in 19s Details CI / Detect changes (pull_request) Successful in 1m39s Details review-check-tests / review-check.sh regression tests (pull_request) Successful in 27s Details Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 22s Details E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 1m26s Details E2E API Smoke Test / detect-changes (pull_request) Successful in 1m35s Details lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m31s Details gate-check-v3 / gate-check (pull_request) Successful in 46s Details qa-review / approved (pull_request) Failing after 33s Details sop-tier-check / tier-check (pull_request) Successful in 21s Details sop-checklist / all-items-acked (pull_request) Successful in 26s Details security-review / approved (pull_request) Failing after 34s Details Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 8s Details Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 1m12s Details E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 12s Details E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 10s Details Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 5s Details Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 1m32s Details CI / Python Lint & Test (pull_request) Successful in 7m44s Details CI / Platform (Go) (pull_request) Successful in 21m3s Details CI / Canvas (Next.js) (pull_request) Successful in 21m10s Details CI / Canvas Deploy Reminder (pull_request) Successful in 8s Details CI / all-required (pull_request) Successful in 28m9s Details Cherry-pick of `ffd52506` onto origin/main. Main already has _NA_DIRECTIVE_RE and compute_na_state defined but is missing the parse_directives return type change (list → tuple) needed for the N/A loop. Also applies the review-check.sh 403-fail-closed → skip-and-continue fix so that a 403 on one candidate doesn't hard-fail the entire gate when other valid team-members exist. RFC#324 §N/A follow-up. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-15 23:37:06 +00:00