Compare commits

..

3 Commits

Author SHA1 Message Date
core-be f5ff5037b9 fix(handlers): replace time.Sleep with explicit async drain in 1 test (issue #1264)
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 13s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 14s
Harness Replays / detect-changes (pull_request) Successful in 14s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 19s
review-check-tests / review-check.sh regression tests (pull_request) Successful in 16s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 15s
CI / Detect changes (pull_request) Successful in 30s
sop-checklist / all-items-acked (pull_request) Successful in 16s
sop-tier-check / tier-check (pull_request) Successful in 14s
security-review / approved (pull_request) Failing after 18s
qa-review / approved (pull_request) Failing after 20s
E2E API Smoke Test / detect-changes (pull_request) Successful in 40s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 37s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 41s
Harness Replays / Harness Replays (pull_request) Successful in 6s
gate-check-v3 / gate-check (pull_request) Successful in 28s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 6s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 8s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m20s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 1m30s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 1m56s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 4m28s
CI / Python Lint & Test (pull_request) Successful in 7m10s
CI / Platform (Go) (pull_request) Successful in 11m21s
CI / Canvas (Next.js) (pull_request) Successful in 11m21s
CI / all-required (pull_request) Successful in 11m21s
CI / Canvas Deploy Reminder (pull_request) Successful in 2s
audit-force-merge / audit (pull_request) Has been skipped
Issue #1264: CI/Platform(Go) tests flake under parallel CI load.
TestProxyA2A_AllowedSelf_SkipsAccessCheck uses time.Sleep(50ms) to wait
for goroutines launched by goAsync() — same pattern as the 4 tests fixed
in PR #1282. Replacing with handler.waitAsyncForTest() ensures deterministic
async completion regardless of runner speed/pressure.

Also fixes the sop-checklist test file (parse_directives tuple return
type mismatch) that was committed in broken state to PR #1284.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 04:25:22 +00:00
core-be 05ef0964f6 fix(sop-checklist): skip blank lines when scanning for section content
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 6s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 15s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 20s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 19s
review-check-tests / review-check.sh regression tests (pull_request) Successful in 21s
CI / Detect changes (pull_request) Successful in 30s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 7s
gate-check-v3 / gate-check (pull_request) Failing after 33s
qa-review / approved (pull_request) Failing after 21s
sop-tier-check / tier-check (pull_request) Successful in 19s
security-review / approved (pull_request) Failing after 23s
E2E API Smoke Test / detect-changes (pull_request) Successful in 52s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 52s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 52s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 7s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 5s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 7s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m24s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Failing after 1m26s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
CI / Platform (Go) (pull_request) Successful in 7m10s
CI / Python Lint & Test (pull_request) Successful in 7m13s
CI / all-required (pull_request) Failing after 20m11s
CI / Canvas (Next.js) (pull_request) Failing after 20m11s
CI / Canvas Deploy Reminder (pull_request) Has been cancelled
section_marker_present() checked only the immediately next line when the
marker line had no trailing content.  Markdown-header PR bodies use the
pattern:

    ## Comprehensive testing performed

    - go test -v ...

where a blank line separates the header from the content.  The function
saw the blank line (empty after stripping), returned False, and reported
"body-unfilled" for every section — causing "acked: 0/7 — body-unfilled"
on PRs whose bodies were correctly filled.

Fix: loop forward, skipping sequences of \n characters, until we find a
line with non-whitespace content or reach end-of-body.  This also
handles the edge case where the PR author leaves only blank lines after
the marker (still correctly rejected).

Add tests for:
- marker with one blank line before content (the actual PR pattern)
- marker with multiple blank lines before content
- marker with only blank lines after header (must still be False)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 04:15:41 +00:00
core-devops da79f17096 fix(sop-checklist): update parse_directives return type + review-check 403 fix
audit-force-merge / audit (pull_request) Has been skipped
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 27s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 36s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 19s
CI / Detect changes (pull_request) Successful in 1m39s
review-check-tests / review-check.sh regression tests (pull_request) Successful in 27s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 22s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 1m26s
E2E API Smoke Test / detect-changes (pull_request) Successful in 1m35s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m31s
gate-check-v3 / gate-check (pull_request) Successful in 46s
qa-review / approved (pull_request) Failing after 33s
sop-tier-check / tier-check (pull_request) Successful in 21s
sop-checklist / all-items-acked (pull_request) Successful in 26s
security-review / approved (pull_request) Failing after 34s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 8s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 1m12s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 12s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 10s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 5s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 1m32s
CI / Python Lint & Test (pull_request) Successful in 7m44s
CI / Platform (Go) (pull_request) Successful in 21m3s
CI / Canvas (Next.js) (pull_request) Successful in 21m10s
CI / Canvas Deploy Reminder (pull_request) Successful in 8s
CI / all-required (pull_request) Successful in 28m9s
Cherry-pick of ffd52506 onto origin/main. Main already has _NA_DIRECTIVE_RE
and compute_na_state defined but is missing the parse_directives return type
change (list → tuple) needed for the N/A loop. Also applies the review-check.sh
403-fail-closed → skip-and-continue fix so that a 403 on one candidate doesn't
hard-fail the entire gate when other valid team-members exist.

RFC#324 §N/A follow-up.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-15 23:37:06 +00:00
8 changed files with 277 additions and 639 deletions
+1 -2
View File
@@ -284,8 +284,7 @@ def list_queued_issues() -> list[dict]:
query={
"state": "open",
"type": "pulls",
# NOTE: Gitea 1.22.6 uses `label` (singular), not `labels` (plural).
"label": QUEUE_LABEL,
"labels": QUEUE_LABEL,
"limit": "50",
},
)
+13 -7
View File
@@ -214,7 +214,10 @@ fi
# Endpoint: GET /api/v1/teams/{id}/members/{username}
# 200/204 → is member
# 403 → token owner is not in this team (Gitea 1.22.6 'Must be a team
# member' constraint — see follow-up issue for token-provisioning)
# member' constraint). The evaluator skips this candidate and
# continues to check others. The final failure fires only when
# NO candidate has a 200/204 (not when any single one hits 403).
# See RFC#324 token-scope follow-up issue for long-term fix.
# 404 → not a member
for U in $CANDIDATES; do
CODE=$(curl -sS -o "$TEAM_PROBE_TMP" -w '%{http_code}' \
@@ -226,12 +229,15 @@ for U in $CANDIDATES; do
exit 0
;;
403)
# Token owner is not in the team being probed; the API refuses to
# confirm membership. This is the RFC#324 follow-up token-scope gap.
# Fail closed — never grant approval on a 403; surface clearly.
echo "::error::team-probe for ${U} in ${TEAM} returned 403 (token owner not in ${TEAM} team — RFC#324 token-scope follow-up). Cannot confirm membership; failing closed."
# Token owner is not in the team being probed; Gitea 1.22.6 refuses
# to confirm membership in this case. Do NOT hard-fail the gate on a
# 403 — doing so would fail the entire gate if ANY candidate triggers
# a 403, even when other valid team-members exist. Instead skip this
# candidate and continue checking others. If all candidates produce
# 403 (token owner can't query any of them) the final exit fires.
echo "::warning::team-probe for ${U} in ${TEAM} returned 403 (token owner not in ${TEAM} team — skipping; cannot confirm membership)"
cat "$TEAM_PROBE_TMP" >&2
exit 1
continue
;;
404)
debug "${U} not a member of ${TEAM}"
@@ -243,5 +249,5 @@ for U in $CANDIDATES; do
esac
done
echo "::error::${TEAM}-review awaiting non-author APPROVE from ${TEAM} team (candidates: $(echo "$CANDIDATES" | tr '\n' ',' | sed 's/,$//') — none are in team)"
echo "::error::${TEAM}-review awaiting non-author APPROVE from ${TEAM} team (candidates: $(echo "$CANDIDATES" | tr '\n' ',' | sed 's/,$//') — no valid team-member approval found; check that reviewer is in ${TEAM} team or token owner is a ${TEAM} team member)"
exit 1
+234 -22
View File
@@ -102,7 +102,7 @@ def normalize_slug(raw: str, numeric_aliases: dict[int, str] | None = None) -> s
# ---------------------------------------------------------------------------
# Comment parsing — /sop-ack and /sop-revoke
# Comment parsing — /sop-ack, /sop-revoke, and /sop-n/a
# ---------------------------------------------------------------------------
# A directive must be on its own line. Permits leading whitespace.
@@ -114,21 +114,33 @@ _DIRECTIVE_RE = re.compile(
re.MULTILINE,
)
# /sop-n/a <gate> [reason] — declare a qa/sec gate N/A.
# Gate names: qa-review, security-review (match review-check.sh context names).
_NA_DIRECTIVE_RE = re.compile(
r"^[ \t]*/sop-n/a[ \t]+([A-Za-z0-9_\-]+)(?:[ \t]+(.*))?[ \t]*$",
re.MULTILINE,
)
def parse_directives(
comment_body: str,
numeric_aliases: dict[int, str],
) -> list[tuple[str, str, str]]:
"""Extract /sop-ack and /sop-revoke directives from a comment body.
) -> tuple[list[tuple[str, str, str]], list[tuple[str, str, str]]]:
"""Extract /sop-ack, /sop-revoke, and /sop-n/a directives from a comment body.
Returns a list of (kind, canonical_slug, note) tuples where:
kind is "sop-ack" or "sop-revoke"
canonical_slug is the normalized form (or "" if unparseable)
note is the trailing free-text (may be "")
Returns (directives, na_directives) where:
directives is a list of (kind, canonical_slug, note) tuples
kind is "sop-ack" or "sop-revoke"
canonical_slug is the normalized form (or "" if unparseable)
note is the trailing free-text (may be "")
na_directives is a list of (gate_name, reason) tuples
gate_name is "qa-review" or "security-review" (raw from comment)
reason is the free-text after the gate name (may be "")
"""
out: list[tuple[str, str, str]] = []
na_out: list[tuple[str, str, str]] = []
if not comment_body:
return out
return out, na_out
for m in _DIRECTIVE_RE.finditer(comment_body):
kind = m.group(1)
raw_slug = (m.group(2) or "").strip()
@@ -159,7 +171,11 @@ def parse_directives(
# If we collapsed multi-word slug into kebab and there's a
# trailing-text group too, append it.
out.append((kind, canonical, note_from_group))
return out
for m in _NA_DIRECTIVE_RE.finditer(comment_body):
gate_raw = (m.group(1) or "").strip()
reason = (m.group(2) or "").strip()
na_out.append((gate_raw.lower(), reason))
return out, na_out
# ---------------------------------------------------------------------------
@@ -172,8 +188,8 @@ def section_marker_present(body: str, marker: str) -> bool:
on a non-empty line (i.e. the author actually filled it in).
We require the marker substring AND non-whitespace content on the
same line OR within the next line — this prevents trivially-empty
checklists like:
same line OR within the next non-blank line — this prevents
trivially-empty checklists like:
## SOP-Checklist
- [ ] **Comprehensive testing performed**:
@@ -182,6 +198,10 @@ def section_marker_present(body: str, marker: str) -> bool:
from auto-passing the section-present check. The peer-ack is still
required, but answering with empty content is captured as a soft
finding via the section-present test alone.
NOTE: we scan forward through blank lines (the markdown-header pattern
is ## Header\\n\\ncontent) so that a header + blank-line + content
structure still satisfies the check.
"""
if not body or not marker:
return False
@@ -200,13 +220,27 @@ def section_marker_present(body: str, marker: str) -> bool:
stripped = re.sub(r"[\s\*:\-\[\]]+", "", line)
if stripped:
return True
# Fall through: check the NEXT line (multi-line answers).
next_line_end = body.find("\n", line_end + 1)
if next_line_end < 0:
next_line_end = len(body)
next_line = body[line_end + 1:next_line_end]
stripped_next = re.sub(r"[\s\*:\-\[\]]+", "", next_line)
return bool(stripped_next)
# Fall through: scan forward, skipping blank-only lines, until we find
# non-empty content or run out of body. Handles:
# ## Header ← marker line (empty after marker)
# ← blank line (skipped)
# - actual content ← found
pos = line_end
while True:
# Skip the current newline and any additional newlines (blank lines).
while pos < len(body) and body[pos] == "\n":
pos += 1
if pos >= len(body):
break
line_end = body.find("\n", pos)
if line_end < 0:
line_end = len(body)
line = body[pos:line_end]
stripped = re.sub(r"[\s\*:\-\[\]]+", "", line)
if stripped:
return True
pos = line_end
return False
# ---------------------------------------------------------------------------
@@ -249,7 +283,8 @@ def compute_ack_state(
user = (c.get("user") or {}).get("login", "")
if not user:
continue
for kind, slug, _note in parse_directives(body, numeric_aliases):
directives, _na = parse_directives(body, numeric_aliases)
for kind, slug, _note in directives:
if not slug:
unparseable_per_user[user] = unparseable_per_user.get(user, 0) + 1
continue
@@ -301,6 +336,82 @@ def compute_ack_state(
}
def compute_na_state(
comments: list[dict[str, Any]],
pr_author: str,
na_gates: dict[str, dict[str, Any]],
team_membership_probe: "callable[[str, list[str]], list[str]]",
) -> dict[str, dict[str, Any]]:
"""Compute per-gate N/A declaration state.
Each comment is processed in chronological order. The most-recent
N/A directive per (commenter, gate) wins.
Returns a dict keyed by gate name:
{
"qa-review": {
"declared": True,
"declared_by": "core-qa-agent",
"reason": "CI/non-security-touching",
"valid": True, # non-author + in required team
"error": None, # error string if invalid
},
...
}
Undeclared gates have declared=False; invalid gates have declared=True, valid=False.
"""
# Step 1: collapse N/A directives per (commenter, gate) — most recent wins.
latest_na: dict[tuple[str, str], tuple[str, str]] = {}
for c in comments:
body = c.get("body", "") or ""
user = (c.get("user") or {}).get("login", "")
if not user:
continue
_, na_directives = parse_directives(body, {})
for gate, reason in na_directives:
if gate not in na_gates:
continue
latest_na[(user, gate)] = (gate, reason)
# Step 2: initialise all gates as undeclared.
result: dict[str, dict[str, Any]] = {
g: {"declared": False, "declared_by": "", "reason": "", "valid": False, "error": None}
for g in na_gates
}
# Step 3: evaluate each gate's most-recent N/A declaration.
for (user, gate), (gate_name, reason) in latest_na.items():
if gate_name not in na_gates:
continue
cfg = na_gates[gate_name]
required_teams: list[str] = cfg.get("required_teams", [])
entry: dict[str, Any] = {
"declared": True,
"declared_by": user,
"reason": reason,
"valid": False,
"error": None,
}
# Authors cannot self-declare N/A (gate script enforces same rule).
if user == pr_author:
entry["error"] = "self-declare N/A rejected"
else:
# Probe team membership: is the declarer in any required team?
approved = team_membership_probe(f"na:{gate_name}", [user])
if user in approved:
entry["valid"] = True
else:
# 403 from team API means token owner not in that team.
# Fail-closed: treat unknown membership as invalid.
entry["error"] = f"{user} not in required team {required_teams}"
result[gate_name] = entry
return result
# ---------------------------------------------------------------------------
# Gitea API client
# ---------------------------------------------------------------------------
@@ -460,10 +571,29 @@ def _load_config_minimal(path: str) -> dict[str, Any]:
tier_failure_mode), top-level list of maps (items:), and within an
item map: scalars + lists of scalars. Does NOT support nested lists,
YAML anchors, multi-doc, or flow style.
Key names containing '/' (e.g. n/a_gates) are handled by using
rpartition(':') — splitting at the LAST colon so embedded colons
in the key are preserved.
"""
with open(path) as f:
lines = f.readlines()
return _parse_minimal_yaml(lines)
# Preprocess: for lines at indent 0 that contain '/' before ':',
# use rpartition so the key keeps the '/'. e.g.
# "n/a_gates:" → key="n/a_gates", val=""
# "n/a_gates: value" → key="n/a_gates", val="value"
processed: list[str] = []
for raw in lines:
stripped = raw.rstrip("\n")
indent = len(stripped) - len(stripped.lstrip(" "))
content = stripped.lstrip(" ")
if indent == 0 and "/" in content and ":" in content:
# Use rpartition so the last ':' is the key-value separator.
key, _, val = content.rpartition(":")
processed.append(" " * indent + key.strip() + ": " + val.strip())
else:
processed.append(stripped)
return _parse_minimal_yaml(processed)
def _parse_minimal_yaml(lines: list[str]) -> dict[str, Any]: # noqa: C901
@@ -800,6 +930,90 @@ def main(argv: list[str] | None = None) -> int:
extra = " (" + "; ".join(extras) + ")" if extras else ""
print(f"::notice:: [WAIT] {slug} — no valid peer-ack yet{extra}")
# ----- N/A gate declarations (RFC#324 §N/A follow-up) -----
# sop-checklist.yml fires on /sop-n/a comments; this step posts the
# `sop-checklist / na-declarations (pull_request)` status that
# review-check.sh reads to waive the Gitea-APPROVE requirement.
na_gates: dict[str, Any] = cfg.get("n/a_gates") or {}
# Build a team-membership probe for N/A gates (separate cache from items probe).
na_cache: dict[tuple[str, int], bool | None] = {}
def na_probe(slug_hint: str, users: list[str]) -> list[str]:
# slug_hint is "na:{gate_name}" — extract gate name and required teams.
gate_name = slug_hint.removeprefix("na:")
gate_cfg = na_gates.get(gate_name, {})
team_names: list[str] = gate_cfg.get("required_teams", [])
# Resolve team names → ids.
team_ids: list[int] = []
for tn in team_names:
tid = client.resolve_team_id(args.owner, tn) # noqa: SLF001
if tid is None:
code, data = client._req( # noqa: SLF001
"GET", f"/orgs/{args.owner}/teams"
)
if code == 200 and isinstance(data, list):
for t in data:
if t.get("name") == tn:
tid = t.get("id")
client._team_id_cache[(args.owner, tn)] = tid # noqa: SLF001
break
if tid is not None:
team_ids.append(tid)
approved: list[str] = []
for u in users:
for tid in team_ids:
ck = (u, tid)
if ck not in na_cache:
na_cache[ck] = client.is_team_member(tid, u) # noqa: SLF001
res = na_cache[ck]
if res is True:
approved.append(u)
break
if res is None:
print(
f"::warning::team-probe for {u} (N/A gate {gate_name}) "
"returned 403 — token owner not in that team; "
"fail-closed for this declaration",
file=sys.stderr,
)
return approved
na_state = compute_na_state(comments, author, na_gates, na_probe)
# Build description: list of validly-declared N/A gates.
na_approved_gates = [
g for g, entry in na_state.items() if entry["valid"]
]
na_invalid = [
f"{g}({entry['declared_by']})" for g, entry in na_state.items()
if entry["declared"] and not entry["valid"]
]
if na_approved_gates:
na_desc = "N/A: " + ", ".join(na_approved_gates)
elif na_invalid:
na_desc = "invalid N/A: " + ", ".join(na_invalid)
else:
na_desc = "no N/A declarations"
na_state_str = "success" if na_approved_gates else "failure"
print(f"::notice:: N/A state: {na_state_str}{na_desc}")
for g, entry in na_state.items():
if entry["declared"]:
status_flag = "valid" if entry["valid"] else f"invalid: {entry['error']}"
print(f"::notice:: {g}: declared by {entry['declared_by']}{status_flag}")
target_url = f"https://{args.gitea_host}/{args.owner}/{args.repo}/pulls/{args.pr}"
if not args.dry_run:
na_context = "sop-checklist / na-declarations (pull_request)"
client.post_status(
args.owner, args.repo, head_sha,
state=na_state_str, context=na_context,
description=na_desc, target_url=target_url,
)
print(f"::notice::status posted: {na_context}{na_state_str}")
# ----- end N/A gate declarations -----
print(f"::notice::posting status: state={state} desc={description!r}")
if args.dry_run:
@@ -807,8 +1021,6 @@ def main(argv: list[str] | None = None) -> int:
if args.exit_on_state:
return 0 if state in ("success", "pending") else 1
return 0
target_url = f"https://{args.gitea_host}/{args.owner}/{args.repo}/pulls/{args.pr}"
client.post_status(
args.owner, args.repo, head_sha,
state=state, context=args.status_context,
+25 -2
View File
@@ -135,8 +135,8 @@ class TestParseDirectives(unittest.TestCase):
self.aliases = _numeric_aliases()
def parse_ack_revoke(self, body):
directives, na_directives = sop.parse_directives(body, self.aliases)
self.assertEqual(na_directives, [])
# parse_directives returns (directives, na_directives) per PR #1263.
directives, _na = sop.parse_directives(body, self.aliases)
return directives
def test_simple_ack(self):
@@ -243,6 +243,29 @@ class TestSectionMarkerPresent(unittest.TestCase):
body = "- [ ] **comprehensive TESTING performed**: yes"
self.assertTrue(sop.section_marker_present(body, "Comprehensive testing performed"))
def test_marker_with_blank_line_before_content(self):
# Markdown-header pattern: ## Header\n\n- content.
# The blank line between header and content must NOT cause a false-negative.
body = (
"## Comprehensive testing performed\n\n"
"- `go test -v ./workspace-server/internal/secrets/...` — all 8 tests pass.\n"
"- `go test -cover ...` — 100.0% coverage.\n"
)
self.assertTrue(sop.section_marker_present(body, "Comprehensive testing performed"))
def test_marker_with_multiple_blank_lines_before_content(self):
# Three blank lines before content still passes.
body = (
"## Staging-smoke verified or pending\n\n\n\n"
"Post-merge: go test ./internal/secrets/... on the merged staging branch.\n"
)
self.assertTrue(sop.section_marker_present(body, "Staging-smoke verified or pending"))
def test_marker_with_only_blank_lines_after_header(self):
# Marker present but only blank lines follow → no real content → False.
body = "## Root-cause not symptom\n\n \n\n"
self.assertFalse(sop.section_marker_present(body, "Root-cause not symptom"))
def test_empty_body(self):
self.assertFalse(sop.section_marker_present("", "X"))
self.assertFalse(sop.section_marker_present(None, "X"))
-225
View File
@@ -1,225 +0,0 @@
name: E2E Peer Visibility (literal MCP list_peers)
# WHY A DEDICATED WORKFLOW (not folded into e2e-staging-saas.yml)
# --------------------------------------------------------------
# This is the systemic fix for a real trust failure. Hermes and OpenClaw
# were reported "fleet-verified / cascade-complete" because the *proxy*
# signals were green (registry registration + heartbeat for Hermes; model
# round-trip 200 for OpenClaw). A freshly-provisioned workspace asked on
# canvas "can you see your peers" actually FAILS:
# - Hermes: 401 on the molecule MCP `list_peers` call
# - OpenClaw: native `sessions_list` fallback, sees no platform peers
# Tasks #142/#159 were even marked "completed" under this proxy flaw.
#
# A dedicated workflow (vs extending e2e-staging-saas.yml) because:
# - It must provision MULTIPLE distinct runtimes (hermes, openclaw,
# claude-code) in ONE org and assert each sees the others. The
# full-saas script is single-runtime-per-run (E2E_RUNTIME) and folding
# a multi-runtime matrix into it would conflate concerns and bloat its
# already-45-min run.
# - It needs its own concurrency group so it doesn't fight full-saas /
# canvas for the staging org-creation quota.
# - It needs an independent, non-required status-context name so it can
# be RED today (the in-flight Hermes-401 / OpenClaw-MCP-wiring fixes
# have not landed) WITHOUT wedging unrelated merges — and flipped to
# REQUIRED in one branch-protection edit once it goes green
# (flip-to-required checklist: molecule-core#1296).
#
# THE ASSERTION IS NOT A PROXY. The driving script
# tests/e2e/test_peer_visibility_mcp_staging.sh issues the byte-for-byte
# JSON-RPC `tools/call name=list_peers` envelope to `POST
# /workspaces/:id/mcp` using each workspace's OWN bearer token, through
# the real WorkspaceAuth + MCPRateLimiter middleware chain — the exact
# call mcp_molecule_list_peers makes from a canvas agent. It does NOT
# read a registry row, /health, the heartbeat table, or
# GET /registry/:id/peers.
#
# HONEST GATE — NO continue-on-error. Per feedback_fix_root_not_symptom a
# fake-green mask would defeat the entire purpose. This workflow goes red
# on today's broken behavior and green only when the root-cause fixes
# actually land. It is intentionally NOT in branch_protections — see PR
# body for the required-vs-not decision + flip tracking issue.
#
# Gitea 1.22.6 / act_runner notes honored:
# - No cross-repo `uses:` (feedback_gitea_cross_repo_uses_blocked). The
# actions/checkout SHA is the one e2e-staging-canvas.yml already uses
# successfully (a mirrored SHA — see #1277/PR#1292 root-cause).
# - Per-SHA concurrency, not global (feedback_concurrency_group_per_sha).
# - Workflow-level GITHUB_SERVER_URL pinned
# (feedback_act_runner_github_server_url).
# - pr-validate posts a status under the same check name so a
# workflow-only PR is not silently statusless and the context is
# flip-to-required-ready (mirrors e2e-staging-saas.yml's proven shape;
# real EC2-provisioning E2E is push/dispatch/cron only — it is 30+ min
# and cannot run per-PR-update).
on:
push:
branches: [main]
paths:
- 'workspace-server/internal/handlers/mcp.go'
- 'workspace-server/internal/handlers/mcp_tools.go'
- 'workspace-server/internal/middleware/**'
- 'workspace-server/internal/handlers/registry.go'
- 'workspace-server/internal/handlers/workspace.go'
- 'workspace/a2a_mcp_server.py'
- 'workspace/platform_tools/registry.py'
- 'tests/e2e/test_peer_visibility_mcp_staging.sh'
- '.gitea/workflows/e2e-peer-visibility.yml'
pull_request:
branches: [main]
paths:
- 'workspace-server/internal/handlers/mcp.go'
- 'workspace-server/internal/handlers/mcp_tools.go'
- 'workspace-server/internal/middleware/**'
- 'workspace-server/internal/handlers/registry.go'
- 'workspace-server/internal/handlers/workspace.go'
- 'workspace/a2a_mcp_server.py'
- 'workspace/platform_tools/registry.py'
- 'tests/e2e/test_peer_visibility_mcp_staging.sh'
- '.gitea/workflows/e2e-peer-visibility.yml'
workflow_dispatch:
schedule:
# 07:30 UTC daily — catches AMI / template-hermes / template-openclaw
# drift even on quiet days. Offset 30m from e2e-staging-saas (07:00)
# so the two don't collide on the staging org-creation quota.
- cron: '30 7 * * *'
concurrency:
# Per-SHA (feedback_concurrency_group_per_sha). A single global group
# would let a queued staging/main push behind a PR run get cancelled,
# leaving any gate that reads "completed run at SHA" stuck.
group: e2e-peer-visibility-${{ github.event.pull_request.head.sha || github.sha }}
cancel-in-progress: false
env:
GITHUB_SERVER_URL: https://git.moleculesai.app
jobs:
# PR path: post a real status under the required-ready check name so a
# workflow-only PR is never silently statusless. The actual EC2 E2E is
# push/dispatch/cron only (30+ min). This is NOT a fake-green mask of
# the real assertion — it validates the driving script's bash syntax
# and inline-python so a broken test script fails at PR time.
pr-validate:
name: E2E Peer Visibility
runs-on: ubuntu-latest
if: github.event_name == 'pull_request'
timeout-minutes: 5
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: Validate driving script
run: |
bash -n tests/e2e/test_peer_visibility_mcp_staging.sh
echo "test_peer_visibility_mcp_staging.sh — bash syntax OK"
echo "Real fresh-provision MCP list_peers E2E runs on push to"
echo "main / workflow_dispatch / daily cron (30+ min EC2 boot)."
# Real gate: provisions a throwaway org + sibling-per-runtime, drives
# the LITERAL list_peers MCP call per runtime, asserts 200 + expected
# peer set, then scoped teardown. push(main)/dispatch/cron only.
peer-visibility:
name: E2E Peer Visibility
runs-on: ubuntu-latest
if: github.event_name != 'pull_request'
timeout-minutes: 60
env:
MOLECULE_CP_URL: https://staging-api.moleculesai.app
MOLECULE_ADMIN_TOKEN: ${{ secrets.CP_STAGING_ADMIN_API_TOKEN }}
# LLM provider key so each runtime can authenticate at boot.
# Priority MiniMax → direct-Anthropic → OpenAI matches
# test_staging_full_saas.sh's secrets-injection chain.
E2E_MINIMAX_API_KEY: ${{ secrets.MOLECULE_STAGING_MINIMAX_API_KEY }}
E2E_ANTHROPIC_API_KEY: ${{ secrets.MOLECULE_STAGING_ANTHROPIC_API_KEY }}
E2E_OPENAI_API_KEY: ${{ secrets.MOLECULE_STAGING_OPENAI_API_KEY }}
E2E_RUN_ID: "${{ github.run_id }}-${{ github.run_attempt }}"
PV_RUNTIMES: "hermes openclaw claude-code"
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: Verify admin token present
run: |
if [ -z "$MOLECULE_ADMIN_TOKEN" ]; then
echo "::error::CP_STAGING_ADMIN_API_TOKEN secret not set (Railway staging CP_ADMIN_API_TOKEN)"
exit 2
fi
echo "Admin token present"
- name: Verify an LLM key present
run: |
if [ -z "${E2E_MINIMAX_API_KEY:-}" ] && [ -z "${E2E_ANTHROPIC_API_KEY:-}" ] && [ -z "${E2E_OPENAI_API_KEY:-}" ]; then
echo "::error::No LLM provider key set — workspaces fail at boot with 'No provider API key found'. Set MOLECULE_STAGING_MINIMAX_API_KEY (or ANTHROPIC / OPENAI)."
exit 2
fi
echo "LLM key present"
- name: CP staging health preflight
run: |
code=$(curl -sS -o /dev/null -w "%{http_code}" --max-time 10 "$MOLECULE_CP_URL/health")
if [ "$code" != "200" ]; then
echo "::error::Staging CP unhealthy (HTTP $code) — infra, not a workspace bug. Failing loud per feedback_fix_root_not_symptom."
exit 1
fi
echo "Staging CP healthy"
- name: Run fresh-provision peer-visibility E2E (literal MCP list_peers)
run: bash tests/e2e/test_peer_visibility_mcp_staging.sh
# Belt-and-braces scoped teardown: the script installs an EXIT/INT/
# TERM trap, but if the runner itself is cancelled the trap may not
# fire. This always() step deletes ONLY the e2e-pv-<run_id> org this
# run created — never a cluster-wide sweep
# (feedback_never_run_cluster_cleanup_tests_on_live_platform). The
# admin DELETE is idempotent so double-invoking is safe;
# sweep-stale-e2e-orgs is the final net (slug starts with 'e2e-').
- name: Teardown safety net (runs on cancel/failure)
if: always()
env:
ADMIN_TOKEN: ${{ secrets.CP_STAGING_ADMIN_API_TOKEN }}
run: |
set +e
orgs=$(curl -sS "$MOLECULE_CP_URL/cp/admin/orgs?limit=500" \
-H "Authorization: Bearer $ADMIN_TOKEN" 2>/dev/null \
| python3 -c "
import json, sys, os, datetime
run_id = os.environ.get('GITHUB_RUN_ID', '')
try:
d = json.load(sys.stdin)
except Exception:
print(''); sys.exit(0)
# ONLY sweep slugs from THIS run. e2e-pv-<YYYYMMDD>-<run_id>-...
# Sweep today AND yesterday's UTC date so a midnight-crossing run
# still matches its own slug (same bug class as the saas/canvas
# safety nets).
today = datetime.date.today()
yest = today - datetime.timedelta(days=1)
dates = (today.strftime('%Y%m%d'), yest.strftime('%Y%m%d'))
if run_id:
prefixes = tuple(f'e2e-pv-{dt}-{run_id}-' for dt in dates)
else:
prefixes = tuple(f'e2e-pv-{dt}-' for dt in dates)
orgs = d if isinstance(d, list) else d.get('orgs', [])
cands = [o['slug'] for o in orgs
if any(o.get('slug','').startswith(p) for p in prefixes)
and o.get('instance_status') not in ('purged',)]
print('\n'.join(cands))
" 2>/dev/null)
for slug in $orgs; do
echo "Safety-net teardown: $slug"
set +e
curl -sS -o /tmp/pv-cleanup.out -w "%{http_code}" \
-X DELETE "$MOLECULE_CP_URL/cp/admin/tenants/$slug" \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d "{\"confirm\":\"$slug\"}" >/tmp/pv-cleanup.code
set -e
code=$(cat /tmp/pv-cleanup.code 2>/dev/null || echo "000")
if [ "$code" = "200" ] || [ "$code" = "204" ]; then
echo "[teardown] deleted $slug (HTTP $code)"
else
echo "::warning::pv teardown for $slug returned HTTP $code — sweep-stale-e2e-orgs will catch it within MAX_AGE_MINUTES. Body: $(head -c 300 /tmp/pv-cleanup.out 2>/dev/null)"
fi
done
exit 0
@@ -1,376 +0,0 @@
#!/usr/bin/env bash
# Staging E2E — fresh-provision peer-visibility gate via the LITERAL MCP path.
#
# WHY THIS EXISTS
# ---------------
# Hermes and OpenClaw were repeatedly reported "fleet-verified / cascade-
# complete" because the *proxy* signals were green:
# - registry-registration + heartbeat (Hermes), and
# - model round-trip 200 (OpenClaw).
# But a freshly-provisioned workspace, asked on canvas "can you see your
# peers", actually FAILS:
# - Hermes: 401 on the molecule MCP `list_peers` call,
# - OpenClaw: falls back to native `sessions_list`, sees no platform peers.
# Tasks #142/#159 were even marked "completed" under this same proxy flaw.
#
# This script codifies the LITERAL user-facing path so it can never silently
# regress: it provisions a brand-new throwaway org + sibling workspaces via
# the real control-plane provisioning path, then for each runtime that should
# have platform peer-visibility it drives the EXACT MCP call the canvas agent
# makes — `POST /workspaces/:id/mcp` JSON-RPC tools/call name=list_peers,
# authenticated by that workspace's own bearer token through the real
# WorkspaceAuth + MCPRateLimiter middleware chain. It then asserts:
# (1) HTTP 200,
# (2) JSON-RPC `result` present (NOT an `error` object — a -32000
# "tool call failed" or a 401 from WorkspaceAuth fails here),
# (3) the returned peer set CONTAINS the other provisioned sibling
# workspace IDs — not an empty list, not a native-sessions fallback.
#
# This is NOT a proxy. It does not look at a registry row, /health, the
# heartbeat table, or `GET /registry/:id/peers`. It drives the byte-for-byte
# JSON-RPC envelope that mcp_molecule_list_peers issues from a real agent.
#
# It is written to FAIL on today's broken Hermes/OpenClaw behavior and go
# green only when the in-flight root-cause fixes (Hermes-401, OpenClaw MCP
# wiring) actually land. That is the point: it is the objective proof gate.
#
# AUTH MODEL (mirrors tests/e2e/test_staging_full_saas.sh)
# --------------------------------------------------------
# Single MOLECULE_ADMIN_TOKEN (= CP_ADMIN_API_TOKEN on Railway staging)
# drives: POST /cp/admin/orgs (provision), GET
# /cp/admin/orgs/:slug/admin-token (per-tenant token), DELETE
# /cp/admin/tenants/:slug (teardown). The per-tenant admin token drives
# tenant workspace creation; each workspace's OWN auth_token (returned by
# POST /workspaces) drives its MCP call.
#
# Required env:
# MOLECULE_ADMIN_TOKEN CP admin bearer — Railway staging CP_ADMIN_API_TOKEN
# Optional env:
# MOLECULE_CP_URL default https://staging-api.moleculesai.app
# E2E_RUN_ID slug suffix; CI passes ${GITHUB_RUN_ID}
# PV_RUNTIMES space list; default "hermes openclaw claude-code"
# E2E_PROVISION_TIMEOUT_SECS default 1800 (hermes/openclaw cold EC2 budget)
# E2E_MINIMAX_API_KEY / E2E_ANTHROPIC_API_KEY / E2E_OPENAI_API_KEY
# LLM provider key injected so the runtime can boot
# E2E_KEEP_ORG 1 → skip teardown (local debugging only)
#
# Exit codes:
# 0 every runtime saw its peers via the literal MCP call
# 1 generic failure
# 2 missing required env
# 3 provisioning timed out
# 4 teardown left orphan resources
# 10 peer-visibility regression reproduced (the gate firing as designed)
set -uo pipefail
CP_URL="${MOLECULE_CP_URL:-https://staging-api.moleculesai.app}"
ADMIN_TOKEN="${MOLECULE_ADMIN_TOKEN:?MOLECULE_ADMIN_TOKEN required — Railway staging CP_ADMIN_API_TOKEN}"
RUN_ID_SUFFIX="${E2E_RUN_ID:-$(date +%H%M%S)-$$}"
PV_RUNTIMES="${PV_RUNTIMES:-hermes openclaw claude-code}"
PROVISION_TIMEOUT_SECS="${E2E_PROVISION_TIMEOUT_SECS:-1800}"
# Slug MUST start with 'e2e-' so the sweep-stale-e2e-orgs safety net
# (EPHEMERAL_PREFIXES) catches any leak this run fails to tear down.
SLUG="e2e-pv-$(date +%Y%m%d)-${RUN_ID_SUFFIX}"
SLUG=$(echo "$SLUG" | tr '[:upper:]' '[:lower:]' | tr -cd 'a-z0-9-' | head -c 32)
ORG_ID=""
TENANT_URL=""
TENANT_TOKEN=""
log() { echo "[$(date +%H:%M:%S)] $*"; }
fail() { echo "[$(date +%H:%M:%S)] ❌ $*" >&2; exit 1; }
ok() { echo "[$(date +%H:%M:%S)] ✅ $*"; }
admin_call() {
local method="$1" path="$2"; shift 2
curl -sS -X "$method" "$CP_URL$path" \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" "$@"
}
tenant_call() {
local method="$1" path="$2"; shift 2
curl -sS -X "$method" "$TENANT_URL$path" \
-H "Authorization: Bearer $TENANT_TOKEN" \
-H "X-Molecule-Org-Id: $ORG_ID" \
-H "Content-Type: application/json" "$@"
}
# ─── Scoped teardown ───────────────────────────────────────────────────
# Deletes ONLY the org this run created (DELETE /cp/admin/tenants/$SLUG
# with the {"confirm":$SLUG} fat-finger guard). Never a cluster-wide
# sweep — honors feedback_cleanup_after_each_test and
# feedback_never_run_cluster_cleanup_tests_on_live_platform. The
# workflow's always() step + sweep-stale-e2e-orgs are the outer nets.
teardown() {
local rc=$?
set +e
if [ "${E2E_KEEP_ORG:-0}" = "1" ]; then
echo ""
log "[teardown] E2E_KEEP_ORG=1 — leaving $SLUG for debugging (REMEMBER TO DELETE)"
exit $rc
fi
echo ""
log "[teardown] DELETE /cp/admin/tenants/$SLUG (scoped to this run only)"
admin_call DELETE "/cp/admin/tenants/$SLUG" --max-time 120 \
-d "{\"confirm\":\"$SLUG\"}" >/dev/null 2>&1
for j in $(seq 1 24); do
LIST=$(admin_call GET "/cp/admin/orgs?limit=500" 2>/dev/null)
LEAK=$(echo "$LIST" | python3 -c "
import sys, json
try: d = json.load(sys.stdin)
except Exception: print(1); sys.exit(0)
orgs = d if isinstance(d, list) else d.get('orgs', [])
print(sum(1 for o in orgs if o.get('slug') == '$SLUG' and o.get('instance_status') not in ('purged',) and o.get('status') != 'purged'))
" 2>/dev/null || echo 1)
if [ "$LEAK" = "0" ]; then
log "[teardown] ✓ $SLUG purged (after ${j}x5s)"
exit $rc
fi
sleep 5
done
echo "::warning::[teardown] $SLUG still present after 120s — sweep-stale-e2e-orgs will catch it within MAX_AGE_MINUTES" >&2
[ $rc -eq 0 ] && rc=4
exit $rc
}
trap teardown EXIT INT TERM
# ─── 1. Provision the throwaway org ────────────────────────────────────
log "1/6 POST /cp/admin/orgs — slug=$SLUG"
CREATE=$(admin_call POST /cp/admin/orgs \
-d "{\"slug\":\"$SLUG\",\"name\":\"E2E peer-visibility $SLUG\",\"owner_user_id\":\"e2e-runner:$SLUG\"}")
ORG_ID=$(echo "$CREATE" | python3 -c "import sys,json; print(json.load(sys.stdin).get('id',''))" 2>/dev/null)
[ -n "$ORG_ID" ] || fail "org creation failed: $(echo "$CREATE" | head -c 300)"
log " ORG_ID=$ORG_ID"
# ─── 2. Wait for tenant EC2 + DNS ──────────────────────────────────────
log "2/6 waiting for tenant instance_status=running (cold EC2 + cloudflared)..."
DEADLINE=$(( $(date +%s) + PROVISION_TIMEOUT_SECS ))
while true; do
[ "$(date +%s)" -gt "$DEADLINE" ] && fail "tenant never came up within ${PROVISION_TIMEOUT_SECS}s"
STATUS=$(admin_call GET "/cp/admin/orgs?limit=500" 2>/dev/null | python3 -c "
import sys, json
try: d = json.load(sys.stdin)
except Exception: sys.exit(0)
orgs = d if isinstance(d, list) else d.get('orgs', [])
for o in orgs:
if o.get('slug') == '$SLUG':
print(o.get('instance_status') or o.get('status') or 'unknown'); break
" 2>/dev/null)
case "$STATUS" in running|online|ready) break ;; esac
sleep 10
done
log " tenant status=$STATUS"
# ─── 3. Per-tenant admin token + tenant URL ────────────────────────────
log "3/6 fetching per-tenant admin token..."
TT_RESP=$(admin_call GET "/cp/admin/orgs/$SLUG/admin-token")
TENANT_TOKEN=$(echo "$TT_RESP" | python3 -c "import sys,json; print(json.load(sys.stdin).get('admin_token',''))" 2>/dev/null)
[ -n "$TENANT_TOKEN" ] || fail "tenant token fetch failed: $(echo "$TT_RESP" | head -c 200)"
CP_HOST=$(echo "$CP_URL" | sed -E 's#^https?://##; s#/.*$##')
case "$CP_HOST" in
api.*) DERIVED_DOMAIN="${CP_HOST#api.}" ;;
staging-api.*) DERIVED_DOMAIN="staging.${CP_HOST#staging-api.}" ;;
*) DERIVED_DOMAIN="$CP_HOST" ;;
esac
TENANT_URL="https://${SLUG}.${DERIVED_DOMAIN}"
log " tenant url: $TENANT_URL"
log "3b. waiting for tenant /health (TLS/DNS, up to 10min)..."
for i in $(seq 1 120); do
curl -fsS "$TENANT_URL/health" -m 5 -k >/dev/null 2>&1 && { log " /health ok (attempt $i)"; break; }
sleep 5
done
# ─── 4. Provision the parent + one sibling per runtime under test ──────
# Inject the LLM provider key so each runtime can authenticate at boot.
# Priority: MiniMax → direct-Anthropic → OpenAI (mirrors
# test_staging_full_saas.sh's secrets-injection chain).
SECRETS_JSON='{}'
if [ -n "${E2E_MINIMAX_API_KEY:-}" ]; then
SECRETS_JSON=$(python3 -c "import json,os;k=os.environ['E2E_MINIMAX_API_KEY'];print(json.dumps({'ANTHROPIC_BASE_URL':'https://api.minimax.io/anthropic','ANTHROPIC_AUTH_TOKEN':k,'MINIMAX_API_KEY':k}))")
elif [ -n "${E2E_ANTHROPIC_API_KEY:-}" ]; then
SECRETS_JSON=$(python3 -c "import json,os;k=os.environ['E2E_ANTHROPIC_API_KEY'];print(json.dumps({'ANTHROPIC_API_KEY':k}))")
elif [ -n "${E2E_OPENAI_API_KEY:-}" ]; then
SECRETS_JSON=$(python3 -c "import json,os;k=os.environ['E2E_OPENAI_API_KEY'];print(json.dumps({'OPENAI_API_KEY':k,'OPENAI_BASE_URL':'https://api.openai.com/v1','MODEL_PROVIDER':'openai:gpt-4o','HERMES_INFERENCE_PROVIDER':'custom','HERMES_CUSTOM_BASE_URL':'https://api.openai.com/v1','HERMES_CUSTOM_API_KEY':k,'HERMES_CUSTOM_API_MODE':'chat_completions'}))")
fi
log "4/6 provisioning parent (claude-code) + one sibling per runtime under test..."
P_RESP=$(tenant_call POST /workspaces \
-d "{\"name\":\"pv-parent\",\"runtime\":\"claude-code\",\"tier\":3,\"secrets\":$SECRETS_JSON}")
PARENT_ID=$(echo "$P_RESP" | python3 -c "import sys,json; print(json.load(sys.stdin).get('id',''))" 2>/dev/null)
[ -n "$PARENT_ID" ] || fail "parent create failed: $(echo "$P_RESP" | head -c 300)"
log " PARENT_ID=$PARENT_ID"
# WS_IDS[runtime]=id ; WS_TOKENS[runtime]=auth_token (the MCP bearer)
declare -A WS_IDS WS_TOKENS
ALL_WS_IDS="$PARENT_ID"
for rt in $PV_RUNTIMES; do
R=$(tenant_call POST /workspaces \
-d "{\"name\":\"pv-$rt\",\"runtime\":\"$rt\",\"tier\":2,\"parent_id\":\"$PARENT_ID\",\"secrets\":$SECRETS_JSON}")
WID=$(echo "$R" | python3 -c "import sys,json; print(json.load(sys.stdin).get('id',''))" 2>/dev/null)
# auth_token is top-level for container runtimes; external-like nest it
# under connection.auth_token (verified vs staging response shape).
WTOK=$(echo "$R" | python3 -c "
import sys, json
try: d = json.load(sys.stdin)
except Exception: print(''); sys.exit(0)
print(d.get('auth_token') or d.get('connection', {}).get('auth_token') or '')
" 2>/dev/null)
[ -n "$WID" ] || fail "$rt workspace create failed: $(echo "$R" | head -c 300)"
[ -n "$WTOK" ] || fail "$rt workspace did not return an auth_token — cannot drive its MCP call (resp: $(echo "$R" | head -c 300))"
WS_IDS[$rt]="$WID"
WS_TOKENS[$rt]="$WTOK"
ALL_WS_IDS="$ALL_WS_IDS $WID"
log " $rt$WID"
done
# ─── 5. Wait for every sibling online ──────────────────────────────────
log "5/6 waiting for all workspaces status=online (up to ${PROVISION_TIMEOUT_SECS}s — cold boot)..."
WS_DEADLINE=$(( $(date +%s) + PROVISION_TIMEOUT_SECS ))
for rt in $PV_RUNTIMES; do
wid="${WS_IDS[$rt]}"
LAST=""
while true; do
[ "$(date +%s)" -gt "$WS_DEADLINE" ] && fail "$rt ($wid) never reached online (last=$LAST)"
S=$(tenant_call GET "/workspaces/$wid" 2>/dev/null | python3 -c "
import sys, json
try: d = json.load(sys.stdin)
except Exception: sys.exit(0)
w = d.get('workspace') if isinstance(d.get('workspace'), dict) else d
print(w.get('status') or '')
" 2>/dev/null)
[ "$S" != "$LAST" ] && { log " $rt$S"; LAST="$S"; }
case "$S" in
online) break ;;
failed) sleep 10 ;; # transient: bootstrap-watcher 5-min deadline, heartbeat recovers
*) sleep 10 ;;
esac
done
ok " $rt online"
done
# ─── 6. THE GATE — literal mcp_molecule_list_peers via POST /:id/mcp ────
# This is the byte-for-byte user-facing call. NOT GET /registry/:id/peers,
# NOT /health, NOT the heartbeat table. JSON-RPC 2.0 tools/call,
# name=list_peers, authenticated by the workspace's OWN bearer token
# through WorkspaceAuth + MCPRateLimiter.
log "6/6 driving the LITERAL list_peers MCP call per runtime..."
echo ""
RPC_BODY='{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{"name":"list_peers","arguments":{}}}'
REGRESSED=0
declare -A VERDICT
for rt in $PV_RUNTIMES; do
wid="${WS_IDS[$rt]}"
wtok="${WS_TOKENS[$rt]}"
# The expected peer set = every OTHER provisioned workspace (parent +
# the sibling runtimes), excluding the caller itself.
EXPECT_IDS=$(echo "$ALL_WS_IDS" | tr ' ' '\n' | grep -v "^${wid}$" | grep -v '^$')
set +e
RESP=$(curl -sS -X POST "$TENANT_URL/workspaces/$wid/mcp" \
-H "Authorization: Bearer $wtok" \
-H "X-Molecule-Org-Id: $ORG_ID" \
-H "Content-Type: application/json" \
-d "$RPC_BODY" \
-o /tmp/pv_mcp_body.json -w "%{http_code}" 2>/dev/null)
set -e
HTTP_CODE="$RESP"
BODY=$(cat /tmp/pv_mcp_body.json 2>/dev/null || echo '')
echo "--- $rt (ws=$wid) ---"
echo " HTTP $HTTP_CODE"
echo " body: $(echo "$BODY" | head -c 600)"
# (1) HTTP 200 — a 401 (WorkspaceAuth reject, the Hermes symptom) fails here.
if [ "$HTTP_CODE" != "200" ]; then
echo "$rt: list_peers MCP call returned HTTP $HTTP_CODE (expected 200)"
VERDICT[$rt]="FAIL(http=$HTTP_CODE)"
REGRESSED=1
continue
fi
# (2) JSON-RPC result present, not an error object.
PARSE=$(echo "$BODY" | python3 -c "
import sys, json
expect = set(filter(None, '''$EXPECT_IDS'''.split()))
try:
d = json.load(sys.stdin)
except Exception as e:
print('PARSE_ERROR:' + str(e)); sys.exit(0)
if isinstance(d, dict) and d.get('error') is not None:
print('RPC_ERROR:' + json.dumps(d['error'])[:200]); sys.exit(0)
res = d.get('result') if isinstance(d, dict) else None
if res is None:
print('NO_RESULT'); sys.exit(0)
# MCP tools/call result shape: {content:[{type:text,text:'<json or prose>'}]}
text = ''
if isinstance(res, dict):
for c in res.get('content', []):
if c.get('type') == 'text':
text += c.get('text', '')
text_l = text.lower()
# Native-sessions fallback signature (the OpenClaw symptom): the agent
# answered from its own runtime session list, not the platform peer set.
if 'sessions_list' in text_l or 'no platform peers' in text_l or 'native session' in text_l:
print('NATIVE_FALLBACK:' + text[:200]); sys.exit(0)
# The expected sibling IDs must literally appear in the returned peer text.
found = sorted(i for i in expect if i in text)
missing = sorted(expect - set(found))
if not expect:
print('NO_EXPECTED_PEERS_CONFIGURED'); sys.exit(0)
if missing:
print('MISSING_PEERS:found=%d/%d missing=%s' % (len(found), len(expect), ','.join(m[:8] for m in missing)))
sys.exit(0)
print('OK:found=%d/%d' % (len(found), len(expect)))
" 2>/dev/null)
case "$PARSE" in
OK:*)
echo "$rt: list_peers returned 200 and contains all expected peers ($PARSE)"
VERDICT[$rt]="OK"
;;
NATIVE_FALLBACK:*)
echo "$rt: list_peers fell back to NATIVE sessions — sees no platform peers ($PARSE)"
VERDICT[$rt]="FAIL(native-fallback)"
REGRESSED=1
;;
RPC_ERROR:*|NO_RESULT|PARSE_ERROR:*)
echo "$rt: list_peers MCP call did not return a usable result ($PARSE)"
VERDICT[$rt]="FAIL(rpc=$PARSE)"
REGRESSED=1
;;
MISSING_PEERS:*)
echo "$rt: list_peers returned 200 but peer set is wrong/empty ($PARSE)"
VERDICT[$rt]="FAIL(peers=$PARSE)"
REGRESSED=1
;;
*)
echo "$rt: unexpected verdict '$PARSE'"
VERDICT[$rt]="FAIL(unknown)"
REGRESSED=1
;;
esac
echo ""
done
echo "=== SUMMARY — fresh-provision peer-visibility (literal MCP list_peers) ==="
for rt in $PV_RUNTIMES; do
printf ' %-14s %s\n' "$rt" "${VERDICT[$rt]:-NO_RUN}"
done
echo ""
if [ "$REGRESSED" -ne 0 ]; then
echo "✗ GATE FAILED — at least one runtime cannot see its peers via the"
echo " literal mcp_molecule_list_peers call. This is the real user-facing"
echo " failure the proxy signals (registry row / heartbeat / model 200)"
echo " were hiding. Expected RED until the Hermes-401 + OpenClaw-MCP-wiring"
echo " root-cause fixes land; goes green only when they actually do."
exit 10
fi
ok "GATE PASSED — every runtime under test sees its platform peers via the literal MCP call."
exit 0
@@ -295,8 +295,7 @@ func TestProxyA2A_Upstream502_TriggersContainerDeadCheck(t *testing.T) {
c.Request.Header.Set("Content-Type", "application/json")
handler.ProxyA2A(c)
time.Sleep(80 * time.Millisecond)
handler.waitAsyncForTest()
// Caller sees a structured 503 (NOT the upstream 502 which CF would mask).
if w.Code != http.StatusServiceUnavailable {
@@ -352,7 +351,7 @@ func TestProxyA2A_Upstream502_AliveAgent_PropagatesAsIs(t *testing.T) {
c.Request.Header.Set("Content-Type", "application/json")
handler.ProxyA2A(c)
time.Sleep(50 * time.Millisecond)
handler.waitAsyncForTest()
if w.Code != http.StatusBadGateway {
t.Fatalf("alive agent 502 should propagate as 502; got %d: %s", w.Code, w.Body.String())
@@ -537,7 +536,7 @@ func TestProxyA2A_AllowedSelf_SkipsAccessCheck(t *testing.T) {
c.Request.Header.Set("X-Workspace-ID", "ws-self")
handler.ProxyA2A(c)
time.Sleep(50 * time.Millisecond)
handler.waitAsyncForTest()
if w.Code != http.StatusOK {
t.Errorf("expected 200 for self-call, got %d: %s", w.Code, w.Body.String())
@@ -274,7 +274,7 @@ func TestGracefulPreRestart_URLResolutionError(t *testing.T) {
waitForHandlerAsyncBeforeDBCleanup(t, hWrapper.WorkspaceHandler)
hWrapper.gracefulPreRestart(context.Background(), "ws-url-err-111")
time.Sleep(200 * time.Millisecond)
hWrapper.waitAsyncForTest()
// No panic or error expected — proceeds with stop as documented
}