Compare commits

..

15 Commits

Author SHA1 Message Date
core-uiux a8e9222b6b test(lib): add hydrate.test.ts coverage (7 cases) — exponential backoff retry
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 17s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 17s
Harness Replays / detect-changes (pull_request) Successful in 27s
CI / Detect changes (pull_request) Successful in 49s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 46s
E2E API Smoke Test / detect-changes (pull_request) Successful in 47s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 44s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 21s
gate-check-v3 / gate-check (pull_request) Successful in 25s
qa-review / approved (pull_request) Failing after 16s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: 7
security-review / approved (pull_request) Failing after 16s
Harness Replays / Harness Replays (pull_request) Successful in 7s
sop-tier-check / tier-check (pull_request) Successful in 16s
sop-checklist-gate / gate (pull_request) Successful in 17s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 7s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 46s
CI / Platform (Go) (pull_request) Successful in 11s
CI / Python Lint & Test (pull_request) Successful in 6s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Failing after 1m16s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 5s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 6s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 5s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m26s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m29s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 13m31s
CI / Canvas (Next.js) (pull_request) Successful in 15m20s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / all-required (pull_request) Successful in 3s
audit-force-merge / audit (pull_request) Has been skipped
Add test coverage for canvas store hydration with exponential backoff:
- Successful first-attempt hydration (stores + viewport)
- Non-fatal viewport fetch failure (succeeds without viewport)
- MAX_RETRIES=3 exhausts all attempts on persistent failure
- onRetrying callback fires before each retry
- Second-attempt success short-circuits remaining retries
- Exponential backoff: 1000 → 2000 → 4000ms between retries

Uses vi.hoisted() to share mock refs across vi.mock factories
(vitest hoists factories before module-level code).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-12 08:07:32 +00:00
core-uiux 6307102cf7 test(canvas/mobile): add RemoteBadge + WorkspacePill render coverage (14 cases)
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 9s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 12s
CI / Detect changes (pull_request) Successful in 24s
Harness Replays / detect-changes (pull_request) Successful in 20s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 20s
E2E API Smoke Test / detect-changes (pull_request) Successful in 30s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 32s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 32s
qa-review / approved (pull_request) Failing after 17s
gate-check-v3 / gate-check (pull_request) Successful in 24s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 28s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: 7
CI / Platform (Go) (pull_request) Successful in 7s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 5s
security-review / approved (pull_request) Failing after 15s
Harness Replays / Harness Replays (pull_request) Successful in 5s
sop-checklist-gate / gate (pull_request) Successful in 14s
sop-tier-check / tier-check (pull_request) Successful in 13s
CI / Python Lint & Test (pull_request) Successful in 7s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 5s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 9s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 5s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m17s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Failing after 1m21s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m30s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 9m0s
CI / Canvas (Next.js) (pull_request) Successful in 13m3s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / all-required (pull_request) Successful in 4s
Cover RemoteBadge and WorkspacePill — the last two rendering components in
components.tsx that were missing direct tests.

- RemoteBadge: ★ REMOTE badge rendering, span element, border-radius 4px,
  palette color/background application, dark/light difference
- WorkspacePill: brand text, count display, LIVE indicator, string count,
  border-radius pill shape, dark/light background variants

Total mobile test count now: 104 passing (was 90).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-12 07:38:26 +00:00
core-uiux 6348e6bb52 fix(canvas/SearchDialog): split backdrop from dialog for WCAG 4.1.2 compliance
Restructure SearchDialog so the backdrop div is separate from the dialog
container. The outer div previously served as both backdrop and centering
wrapper, which made it impossible to add accessibility attributes
(aria-hidden="true") without hiding the dialog content from screen
readers.

New structure mirrors ConfirmDialog and KeyboardShortcutsDialog:
  - Backdrop: aria-hidden="true", cursor-pointer, click-to-dismiss
  - Dialog: role="dialog", aria-modal, aria-label, relative z-[71]

Also removes the now-unnecessary stopPropagation() on the dialog div.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-12 07:38:26 +00:00
core-uiux 8f2a76c0db fix(canvas): modal dialog guard on Esc/Enter/Cmd+[/]/Z shortcuts
Discovered during WCAG audit: useKeyboardShortcuts.ts had an
isModalOpen() guard for Arrow-key move/resize shortcuts but NOT for
Escape, Enter, Cmd+]/[, or Z. When a modal dialog (role="dialog",
aria-modal="true") is open, pressing Escape cleared the canvas
selection (because the canvas handler fired before the dialog's own
Escape handler), and Enter/Cmd+[/]/Z could interfere with dialog
interactions.

Fix: add isModalOpen() guard to all four shortcut groups, extracted
as a shared helper. Also added 4 new test cases covering the
modal-dialog guard for Esc, Enter, Cmd+[/], and Z.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-12 07:38:26 +00:00
core-uiux 61493a0596 test(canvas/mobile): add primitives.test.tsx coverage (19 cases)
Cover StatusDot (size, circle, halo, flexShrink), TierChip (tiers,
size variants, flexShrink), Chip (value, label+value, pill shape,
soft/accent mode), SectionLabel (text, right slot, uppercase).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-12 07:38:26 +00:00
app-fe bd647428ee feat(mobile): FilterChips + AgentCard WCAG 2.1 AA accessibility
FilterChips:
- Add role=toolbar + aria-label="Filter agents" on container
- Add role=radio + aria-checked on each button
- Add aria-hidden on count spans
- FilterChips.test.tsx: 9 cases

AgentCard:
- Add aria-label composing name, status, tier, remote flag
- AgentCard.test.tsx: 8 cases

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2026-05-12 07:38:26 +00:00
app-fe 13a4469068 feat(mobile): TabBar WCAG 2.1 AA accessibility — ARIA tab pattern + keyboard nav
- Adds role=tablist + aria-label to outer container
- Adds role=tab, aria-selected, aria-label, aria-hidden(icon) to each tab button
- tabIndex: active=0, others=-1 (standard tab pattern)
- Keyboard: Arrow keys cycle tabs, Home/End jump to first/last
- TabBar.test.tsx: 12 cases covering render states and keyboard interaction

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2026-05-12 07:38:26 +00:00
core-uiux ee72a05297 test(canvas): add form-inputs coverage (35 cases) + Section accessibility fix
+ form-inputs.test.tsx: 35 cases across TextInput, NumberInput, Toggle,
  TagList, and Section — pure presentational components in the Config tab.
  Uses vi.hoisted() patterns from established suite; no jest-dom matchers.

+ form-inputs.tsx (Section): add aria-expanded + aria-controls to the
  collapsible toggle button for WCAG 2.1 AA compliance. The content div
  gets a stable id derived from the title; aria-controls links button to
  region. Indicator span gains aria-hidden="true" (decorative only).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-12 07:38:26 +00:00
core-uiux c1be828450 test(canvas/settings,chat): add coverage for EmptyState, SearchBar, UnsavedChangesGuard, AttachmentVideo
- EmptyState: 6 cases — icon aria-hidden, title, body text, CTA button
- SearchBar: 14 cases — store binding, onChange, Escape, Ctrl/Cmd+F focus
- UnsavedChangesGuard: 7 cases — dialog states, Keep/Discard actions, backdrop
  FIX: UnsavedChangesGuard now wires onDiscard via pendingDiscard ref so
  clicking Discard correctly calls the callback on dialog close
- AttachmentVideo: 8 cases — loading/ready/error states, tone borders,
  blob URL cleanup, external URI direct href

No breaking changes. 2387 tests passing.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-12 07:38:26 +00:00
core-uiux 0a93c28a16 test(canvas/settings): add DeleteConfirmDialog + SettingsButton coverage (26 cases)
- DeleteConfirmDialog (15 cases): dialog open via secret:delete-request event,
  title/body text, Cancel closes, dependents loading/list/none states,
  deleteSecret call, confirm 1s delay, disabled→enabled button transition
- SettingsButton (11 cases): aria-label, aria-expanded, gear SVG aria-hidden,
  toggle openPanel/closePanel, active class, tooltip Mac/Ctrl shortcut
  ResizeObserver polyfill for Radix Tooltip

No breaking changes. 2413 tests passing.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-12 07:38:26 +00:00
core-uiux a6fcb642d9 test(canvas/settings): add ServiceGroup coverage (10 cases)
- role=group with aria-label containing service label
- Service icon aria-hidden, correct emoji per service name
- Count label: "1 key" vs "N keys"
- Renders SecretRow for each secret
- Header and rows div structure

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-12 07:38:26 +00:00
core-uiux bf901a9aac test(canvas/chat): add AttachmentImage coverage (10 cases)
Adds Vitest coverage for AttachmentImage — inline image thumbnail with
click-to-fullscreen lightbox. Covers: loading skeleton (240×180),
ready state with blob URL, tone=user/agent border classes, lightbox
open/close on click and Escape, AttachmentChip error fallback, img
onError transition to chip, external URI direct href (no fetch), and
blob URL cleanup on unmount.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-12 07:38:26 +00:00
core-uiux 3a9479de7e test(canvas/chat): add AttachmentAudio + AttachmentPDF coverage (18 cases)
Adds Vitest coverage for two missing attachment renderers:

AttachmentAudio (9 cases):
  - Loading skeleton (280x40) with aria-label
  - <audio controls> with blob src when ready
  - Filename label in ready state
  - tone=user -> blue/accent border
  - tone=agent -> neutral border
  - Error -> AttachmentChip fallback
  - audio onError -> chip transition
  - External URI -> direct href, no fetch
  - Blob URL cleanup on unmount

AttachmentPDF (9 cases):
  - Loading skeleton with PdfGlyph + filename
  - Preview button with glyph, filename, "PDF" label
  - Lightbox opens with <embed> on click
  - Lightbox closes on Escape
  - tone=user -> blue/accent classes on button
  - tone=agent -> neutral border
  - Error -> AttachmentChip fallback
  - External URI -> direct href, no fetch
  - Blob URL cleanup on unmount

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-12 07:38:26 +00:00
core-uiux 5489a0b276 test(canvas/chat): add AttachmentTextPreview coverage (12 cases)
Adds Vitest coverage for AttachmentTextPreview — inline text/code
preview with streaming fetch and expand/truncate.

Covers:
  - Loading skeleton (320x80) with aria-label
  - Ready state with correct text content
  - Filename shown in header
  - Expand button appears when lines > 10
  - Expand button hidden when all lines shown
  - Expand button updates display to full content
  - Download button calls onDownload
  - tone=user -> blue/accent border
  - tone=agent -> neutral border
  - Truncated notice when file exceeds 256 KB
  - Error -> AttachmentChip fallback
  - Cleanup on unmount

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-12 07:38:26 +00:00
core-uiux 45e9c8d30f test(settings): add TokensTab coverage (12 cases)
12 passing: loading spinner, empty state, token list rendering,
each token's prefix/age/Revoke button, API URL correctness, revoke
confirm + cancel dialogs, new-token creation + dismiss, create error,
network error banner.

Root bug fixed: confirm button search was unscoped — when the dialog
opened, two "Revoke" buttons existed (tok2's row + dialog confirm);
find() returned tok2's button first. Scoped the search to
document.querySelector('[role="dialog"]') to hit the correct target.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-12 07:38:26 +00:00
75 changed files with 719 additions and 7326 deletions
-113
View File
@@ -1,113 +0,0 @@
#!/usr/bin/env python3
"""Lint workflow bash for curl status-code capture pollution.
The bad shape is:
HTTP_CODE=$(curl ... -w '%{http_code}' ... || echo "000")
`curl -w` writes the HTTP code to stdout before returning non-zero, so
fallback output inside the same command substitution appends another code.
"""
from __future__ import annotations
import argparse
import glob
import re
import sys
from pathlib import Path
from typing import NamedTuple
SELF = ".gitea/workflows/lint-curl-status-capture.yml"
class Finding(NamedTuple):
path: str
snippet: str
BAD_STATUS_CAPTURE = re.compile(
r"""
\$\(\s*
curl\b
[^)]*
-w\s*['"]%\{http_code\}['"]
[^)]*
\|\|\s*
(?:
echo\s+['"]?000['"]?
|
printf\s+['"]000['"]
)
\s*\)
""",
re.DOTALL | re.VERBOSE,
)
def _logical_shell(content: str) -> str:
"""Collapse bash line continuations so one curl command is one string."""
return re.sub(r"\\\s*\n\s*", " ", content)
def scan_content(path: str, content: str) -> list[Finding]:
flat = _logical_shell(content)
return [
Finding(path=path, snippet=re.sub(r"\s+", " ", match.group(0)).strip()[:160])
for match in BAD_STATUS_CAPTURE.finditer(flat)
]
def scan_paths(paths: list[str]) -> list[Finding]:
findings: list[Finding] = []
for path in paths:
if path == SELF:
continue
content = Path(path).read_text(encoding="utf-8")
findings.extend(scan_content(path, content))
return findings
def default_paths() -> list[str]:
return sorted(glob.glob(".gitea/workflows/*.yml"))
def print_report(findings: list[Finding]) -> None:
if not findings:
print("OK No curl-status-capture pollution patterns detected")
return
print(f"::error::Found {len(findings)} curl-status-capture pollution site(s):")
for finding in findings:
print(
f"::error file={finding.path}::Curl status-capture pollution: "
"'|| echo/printf 000' inside a $(curl ... -w '%{http_code}' ...) "
"subshell. On non-2xx or connection failure, curl's -w writes a "
"status, then exits non-zero, then the fallback appends another "
"status. Fix: route -w into a tempfile so the exit code cannot "
"pollute stdout."
)
print(f" matched: {finding.snippet}...")
print()
print("Fix template:")
print(" set +e")
print(" curl ... -w '%{http_code}' >code.txt 2>/dev/null")
print(" set -e")
print(' HTTP_CODE=$(cat code.txt 2>/dev/null)')
print(' [ -z "$HTTP_CODE" ] && HTTP_CODE="000"')
def main(argv: list[str] | None = None) -> int:
parser = argparse.ArgumentParser()
parser.add_argument("paths", nargs="*", help="workflow files to scan")
args = parser.parse_args(argv)
paths = args.paths or default_paths()
findings = scan_paths(paths)
print_report(findings)
return 1 if findings else 0
if __name__ == "__main__":
raise SystemExit(main())
@@ -1,509 +0,0 @@
#!/usr/bin/env python3
"""lint_bp_context_emit_match — Tier 2f per internal#350.
Rule
----
For a given protected branch, every context in
`branch_protections/<branch>.status_check_contexts` MUST be emitted
by at least one workflow in `.gitea/workflows/*.yml`. Two contexts
match when:
1. The workflow's `name:` equals the context's workflow-part (the
prefix before ` / `).
2. Some job in that workflow has a `name:` (or default-fallback
job-key) equal to the context's job-part (between ` / ` and
` (`).
3. The workflow's `on:` block includes the context's event-part
(in parens at the end), with Gitea's event-name mapping:
- `pull_request` and `pull_request_target` BOTH emit
`(pull_request)` contexts (verified empirically on
molecule-core/main).
- `push` emits `(push)`.
A BP context with no emitter blocks merges forever — Gitea treats
absent-as-`pending`, NOT absent-as-`skipped`-as-`success`. This is
the phantom-required-check class
(`feedback_phantom_required_check_after_gitea_migration`).
The inverse direction (emitter without BP context) is INFORMATIONAL
only — Tier 2g handles that direction at PR-time. Flagging it here
on a daily schedule would falsely surface every transitional state
during a BP rollout.
How the gate works
------------------
Daily scheduled run + workflow_dispatch:
1. GET `branch_protections/{BRANCH}` (needs DRIFT_BOT_TOKEN with
repo-admin scope; same persona as ci-required-drift.yml).
Graceful-degrade on 403/404 per Tier 2a contract.
2. Walk `.gitea/workflows/*.yml` via PyYAML AST. For each workflow,
enumerate its emitted contexts: `{workflow.name} / {job.name or
job-key} ({event})` for each event in `on:` that emits a status.
3. For each BP context, look for an emitter match. Aggregate
orphans.
4. If orphans exist:
- File or PATCH a `[ci-bp-drift]` issue (idempotency contract:
search for exact title prefix, edit existing if open).
- Apply labels `tier:high` + `ci-bp-drift` (lookup IDs per
repo; per `feedback_tier_label_ids_are_per_repo`).
- Exit 1.
5. If no orphans:
- Close any existing `[ci-bp-drift]` issue with a clean-state
comment.
- Exit 0.
Exit codes
----------
0 — clean OR API 403/404 (graceful-degrade, surfaces ::error::).
1 — at least one BP context has no emitter.
2 — env contract violation, workflows-dir missing, or YAML parse
error.
Env
---
GITEA_TOKEN — DRIFT_BOT_TOKEN (repo-admin for branch_protections)
GITEA_HOST — e.g. git.moleculesai.app
REPO — owner/name
BRANCH — defaults to `main`
WORKFLOWS_DIR — defaults to `.gitea/workflows`
DRIFT_LABEL — defaults to `ci-bp-drift`
Memory cross-links
------------------
- internal#350 (the RFC that specs this lint)
- feedback_phantom_required_check_after_gitea_migration
- feedback_tier_label_ids_are_per_repo
- reference_post_suspension_pipeline
"""
from __future__ import annotations
import json
import os
import re
import sys
import urllib.error
import urllib.parse
import urllib.request
from pathlib import Path
from typing import Any
try:
import yaml
except ImportError:
sys.stderr.write(
"::error::PyYAML is required. Install with: pip install PyYAML\n"
)
sys.exit(2)
# Status-check context regex (mirrors lint-required-no-paths.py).
_CONTEXT_RE = re.compile(
r"^(?P<workflow>.+?) / (?P<job>.+) \((?P<event>[^)]+)\)$"
)
# Map a workflow `on:` event-key to the context's event-part. Gitea's
# emitter convention (verified on molecule-core):
# - pull_request → `(pull_request)`
# - pull_request_target → `(pull_request)` (same surface)
# - push → `(push)`
# - schedule → no PR status; scheduled runs don't post
# commit-statuses unless the workflow itself does so explicitly.
# - workflow_dispatch → manually dispatched runs may or may not
# emit; safest to treat as "no PR status" (informational notice
# only).
_EVENT_MAP = {
"pull_request": "pull_request",
"pull_request_target": "pull_request",
"push": "push",
}
# ---------------------------------------------------------------------------
# Env
# ---------------------------------------------------------------------------
def _env(key: str, default: str | None = None) -> str:
v = os.environ.get(key, default)
return v if v is not None else ""
def _require_env(key: str) -> str:
v = os.environ.get(key)
if not v:
sys.stderr.write(f"::error::missing required env var: {key}\n")
sys.exit(2)
return v
# ---------------------------------------------------------------------------
# API helper. Mirrors lint-required-no-paths.py's contract: returns
# (status, payload) tuple with status ∈ {"ok", "not_found", "forbidden",
# "error"}.
# ---------------------------------------------------------------------------
def api(
method: str,
path: str,
*,
body: dict | None = None,
query: dict[str, str] | None = None,
) -> tuple[str, Any]:
host = _env("GITEA_HOST")
token = _env("GITEA_TOKEN")
url = f"https://{host}/api/v1{path}"
if query:
url = f"{url}?{urllib.parse.urlencode(query)}"
data = None
headers = {
"Authorization": f"token {token}",
"Accept": "application/json",
}
if body is not None:
data = json.dumps(body).encode("utf-8")
headers["Content-Type"] = "application/json"
req = urllib.request.Request(
url, method=method, data=data, headers=headers
)
try:
with urllib.request.urlopen(req, timeout=30) as resp:
raw = resp.read()
if not raw:
return ("ok", None)
return ("ok", json.loads(raw))
except urllib.error.HTTPError as e:
if e.code == 404:
return ("not_found", None)
if e.code in (401, 403):
return ("forbidden", None)
return ("error", None)
except (urllib.error.URLError, TimeoutError, json.JSONDecodeError):
return ("error", None)
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
def _get_on(d: Any) -> Any:
"""YAML 1.1 boolean quirk: bare `on:` may parse to True. Handle both."""
if not isinstance(d, dict):
return None
if "on" in d:
return d["on"]
if True in d:
return d[True]
return None
def _on_events(doc: Any) -> set[str]:
"""Return the set of event keys in a workflow's `on:` block.
Accepts all three shapes (string / list / mapping). String/list
shapes can't carry filters but they DO emit. Returns the
Gitea-mapped event names per `_EVENT_MAP`.
"""
on = _get_on(doc)
raw_events: set[str] = set()
if on is None:
return raw_events
if isinstance(on, str):
raw_events.add(on)
elif isinstance(on, list):
for e in on:
if isinstance(e, str):
raw_events.add(e)
elif isinstance(on, dict):
for k in on:
if isinstance(k, str):
raw_events.add(k)
return {_EVENT_MAP[e] for e in raw_events if e in _EVENT_MAP}
def _job_display(jbody: dict, jkey: str) -> str:
"""Return job's `name:` if set, else fall back to the job-key.
Gitea formats status contexts with the job's `name:` when set;
when unset it uses the job key. Matches lint-required-no-paths
convention.
"""
n = jbody.get("name") if isinstance(jbody, dict) else None
if isinstance(n, str) and n:
return n
return jkey
def workflow_contexts(doc: Any) -> set[str]:
"""Return the set of contexts a workflow emits."""
contexts: set[str] = set()
if not isinstance(doc, dict):
return contexts
wf_name = doc.get("name")
if not isinstance(wf_name, str) or not wf_name:
return contexts # no name => no addressable context
events = _on_events(doc)
if not events:
return contexts
jobs = doc.get("jobs")
if not isinstance(jobs, dict):
return contexts
for jkey, jbody in jobs.items():
if jkey == "__lines__": # tolerate line-tracking annotations
continue
if not isinstance(jbody, dict):
continue
disp = _job_display(jbody, jkey)
for ev in events:
contexts.add(f"{wf_name} / {disp} ({ev})")
return contexts
def parse_context(ctx: str) -> tuple[str, str, str] | None:
m = _CONTEXT_RE.match(ctx)
if not m:
return None
return (m.group("workflow"), m.group("job"), m.group("event"))
def _iter_workflow_files(wf_dir: Path) -> list[Path]:
return sorted(list(wf_dir.glob("*.yml")) + list(wf_dir.glob("*.yaml")))
# ---------------------------------------------------------------------------
# Issue idempotency — search for an open issue with the canonical
# title prefix; PATCH if found, POST if not. Mirrors ci-required-drift.
# ---------------------------------------------------------------------------
def _canonical_title(repo: str, branch: str) -> str:
return f"[ci-bp-drift] {repo}/{branch}: BP→emitter mismatch"
def _ensure_labels(repo: str, names: list[str]) -> list[int]:
status, labels = api("GET", f"/repos/{repo}/labels", query={"limit": "50"})
if status != "ok" or not isinstance(labels, list):
return []
out: list[int] = []
by_name = {l["name"]: l["id"] for l in labels if isinstance(l, dict)}
for n in names:
if n in by_name:
out.append(by_name[n])
return out
def file_or_update_issue(
repo: str, branch: str, orphans: list[str], emitter_orphans: list[str]
) -> None:
title = _canonical_title(repo, branch)
body_lines = [
f"BP→emitter drift detected on `{branch}` at "
f"{os.environ.get('GITHUB_RUN_URL', '(run url unavailable)')}.",
"",
f"## Orphan BP contexts ({len(orphans)})",
"",
"These contexts are required by branch protection but NO workflow "
"emits them. PRs merging into this branch will wait forever for a "
"status that never arrives (Gitea treats absent-as-`pending`, NOT "
"absent-as-`skipped`). See "
"`feedback_phantom_required_check_after_gitea_migration`.",
"",
]
for o in orphans:
body_lines.append(f"- `{o}`")
if emitter_orphans:
body_lines += [
"",
f"## Workflows emitting contexts NOT in BP ({len(emitter_orphans)})",
"",
"Informational — Tier 2g handles this direction at PR-time. "
"Listed here for completeness.",
"",
]
for o in emitter_orphans:
body_lines.append(f"- `{o}`")
body_lines += [
"",
"Fix options:",
" 1. PATCH `branch_protections/{branch}.status_check_contexts` "
" to remove the orphan.",
" 2. Restore the emitting workflow (if it was deleted/renamed).",
"",
"Linted by `.gitea/workflows/lint-bp-context-emit-match.yml` "
"(Tier 2f, internal#350).",
]
body = "\n".join(body_lines)
# Idempotency search — find an open issue with the canonical title.
status, hits = api(
"GET",
f"/repos/{repo}/issues",
query={
"type": "issues",
"state": "open",
"q": title,
},
)
existing = None
if status == "ok" and isinstance(hits, list):
for h in hits:
if (
isinstance(h, dict)
and h.get("state") == "open"
and isinstance(h.get("title"), str)
and h["title"].startswith(title)
):
existing = h
break
label_ids = _ensure_labels(repo, ["ci-bp-drift", "tier:high"])
if existing:
api(
"PATCH",
f"/repos/{repo}/issues/{existing['number']}",
body={"body": body, "labels": label_ids} if label_ids else {"body": body},
)
print(
f"::notice::Updated existing drift issue "
f"#{existing['number']}: {existing.get('html_url', '')}"
)
else:
status, posted = api(
"POST",
f"/repos/{repo}/issues",
body={"title": title, "body": body, "labels": label_ids},
)
if status == "ok" and isinstance(posted, dict):
print(
f"::notice::Filed new drift issue "
f"#{posted.get('number')}: {posted.get('html_url', '')}"
)
# ---------------------------------------------------------------------------
# Driver
# ---------------------------------------------------------------------------
def run() -> int:
_require_env("GITEA_TOKEN")
_require_env("GITEA_HOST")
repo = _require_env("REPO")
branch = _env("BRANCH", "main")
wf_dir = Path(_env("WORKFLOWS_DIR", ".gitea/workflows"))
if not wf_dir.is_dir():
sys.stderr.write(f"::error::workflows directory not found: {wf_dir}\n")
return 2
# 1. Pull BP.
status, bp = api("GET", f"/repos/{repo}/branch_protections/{branch}")
if status == "forbidden":
sys.stderr.write(
f"::error::GET branch_protections/{branch} returned HTTP 403 — "
f"DRIFT_BOT_TOKEN lacks repo-admin scope (Gitea 1.22.6 requires "
f"it for this endpoint). Skipping lint with exit 0 to avoid "
f"red-X on every run. Fix: grant repo-admin to mc-drift-bot. "
f"Per Tier 2a contract.\n"
)
return 0
if status == "not_found":
print(
f"::notice::branch '{branch}' has no protection configured; "
f"nothing to lint."
)
return 0
if status != "ok" or not isinstance(bp, dict):
sys.stderr.write(
f"::error::branch_protections/{branch} response unexpected; "
f"status={status}. Treating as transient; exit 0.\n"
)
return 0
bp_contexts: list[str] = list(bp.get("status_check_contexts") or [])
if not bp_contexts:
print(
f"::notice::branch_protections/{branch} has 0 required "
f"status_check_contexts; nothing to lint."
)
return 0
# 2. Enumerate emitter contexts from all workflows.
all_emitter: set[str] = set()
for path in _iter_workflow_files(wf_dir):
try:
doc = yaml.safe_load(path.read_text(encoding="utf-8"))
except yaml.YAMLError as e:
sys.stderr.write(
f"::error file={path}::YAML parse error: {e}; skipping.\n"
)
continue
all_emitter |= workflow_contexts(doc)
print(
f"::notice::Linting {len(bp_contexts)} BP context(s) for {branch} "
f"against {len(all_emitter)} workflow-emitted context(s)."
)
bp_set = set(bp_contexts)
# 3. Find orphans (BP-side: required but no emitter).
bp_orphans = sorted(bp_set - all_emitter)
# Informational: workflow emits but BP doesn't list. Tier 2g
# territory at PR-time. We list these as NOTICE only.
emitter_orphans = sorted(all_emitter - bp_set)
if bp_orphans:
print(
f"::error::Found {len(bp_orphans)} BP context(s) with no "
f"emitter — these would block merges forever (Gitea treats "
f"absent-as-pending, not skipped):"
)
for o in bp_orphans:
# Closest-match hint: name a workflow whose name-part is a
# near-match (lev-1 typo, or same workflow with a different
# event).
parsed = parse_context(o)
hint = ""
if parsed:
wf, _job, _ev = parsed
candidates = sorted(
{c for c in all_emitter if c.startswith(wf + " / ")}
)
if candidates:
hint = (
f" — closest emitter(s): {', '.join(candidates[:3])}"
)
print(f"::error:: - {o}{hint}")
if emitter_orphans:
print(
f"::notice::Also: {len(emitter_orphans)} workflow-emitted "
f"context(s) not in BP (informational; Tier 2g handles at "
f"PR-time):"
)
for o in emitter_orphans:
print(f"::notice:: - {o}")
# File / patch tracking issue.
try:
file_or_update_issue(repo, branch, bp_orphans, emitter_orphans)
except Exception as e:
sys.stderr.write(
f"::error::failed to file drift issue: {e}\n"
)
return 1
if emitter_orphans:
print(
f"::notice::{len(emitter_orphans)} workflow-emitted context(s) "
f"not in BP (informational; Tier 2g handles at PR-time):"
)
for o in emitter_orphans:
print(f"::notice:: - {o}")
print(
f"::notice::BP/emitter match clean: all {len(bp_contexts)} required "
f"context(s) have an emitter."
)
return 0
if __name__ == "__main__":
sys.exit(run())
@@ -98,13 +98,11 @@ except ImportError:
# ---------------------------------------------------------------------------
# Tracker comment regex.
# Matches: `# mc#1234`, `# internal#42`, `# mc#1234 - description`
# Also matches trackers embedded mid-sentence: `# see mc#1234 for details`
# Does NOT match: `# mc1234` (missing inner #), `mc#1234` (no leading
# comment `#`), `# MC#1234` (case-sensitive). The search is line-wide,
# not just at the comment-marker prefix — fixes false-negative when
# the tracker appears mid-sentence (e.g. `internal#350` after prose).
# `#` comment marker), `# MC#1234` (case-sensitive — `mc` and `internal`
# are conventional lower-case repo slugs).
TRACKER_RE = re.compile(
r"(?P<slug>mc|internal)#(?P<num>\d+)\b"
r"#\s*(?P<slug>mc|internal)#(?P<num>\d+)\b"
)
# Truthy continue-on-error values we treat as "true". PyYAML decodes
@@ -1,681 +0,0 @@
#!/usr/bin/env python3
"""lint-pre-flip-continue-on-error — block a PR that flips a job from
``continue-on-error: true`` to ``continue-on-error: false`` (or removes
the key while the base had it ``true``) without proof that the job's
recent runs on the target branch are actually green.
Empirical class — PR #656 / mc#664:
PR #656 (RFC internal#219 Phase 4) flipped 5 ``platform-build``-class
jobs ``continue-on-error: true → false`` on the basis of a
"verified green on main via combined-status check". But that "green"
was the LIE produced by the prior ``continue-on-error: true``:
Gitea Quirk #10 (internal#342 + dup #287) — when a step inside a
job marked ``continue-on-error: true`` fails, the job-level status
is still rolled up as ``success``. So the precondition the PR
claimed to verify was structurally fooled by the bug being
flipped.
mc#664 then captured the surfaced defects (2 unrelated, mutually-
masked regressions):
Class 1: sqlmock helper drift since 2f36bb9a (24 days old)
Class 2: OFFSEC-001 contract collision since 7d1a189f (1 day old)
Codified 04:35Z as hongming-pc2 charter §SOP-N rule (e)
"run-log-grep-before-flip": pull the actual run log + grep for
``--- FAIL`` / ``FAIL\\s`` BEFORE flipping; don't trust the masked
combined-status.
This script structurally enforces that rule at PR time.
How it works (one PR tick):
1. Parse the diff: compare ``.gitea/workflows/*.yml`` at PR base
vs PR head. For each file present in both, parse the YAML AST
and walk ``jobs.<key>.continue-on-error`` on each side. A
"flip" is base ∈ {true} AND head ∈ {false, None/absent}. We
coerce truthy/falsy per YAML semantics (PyYAML normalizes
``true``/``True``/``yes`` to ``True``).
2. For each flipped job, derive its commit-status context name as
``"{workflow.name} / {job.name or job.key} (push)"`` — that's
how Gitea Actions emits the context for runs on
``main``/``staging`` (push event, see also expected_context()
in ci-required-drift.py).
3. Pull the last N commits of the target branch (PR base), fetch
combined commit-status per commit, scan ``statuses[]`` for
contexts matching ANY of the flipped jobs. For each match,
fetch the actual run log via the web-UI route
``{server_url}/{repo}/actions/runs/{run_id}/jobs/{job_idx}/logs``
(per memory ``reference_gitea_actions_log_fetch`` — Gitea 1.22.6
lacks REST ``/actions/runs/*`` endpoints; the web-UI route is the
only working path; see ``reference_gitea_1_22_6_lacks_rest_rerun_endpoints``).
4. Grep each log for the Go-test failure markers ``--- FAIL`` /
``FAIL\\s+<package>`` AND the bash-step error sentinel
``::error::``. If ANY recent log shows any of these AND the
status itself reads ``success``, the job was masked. ``::error::``
the flip with the offending test name + offending run URL +
the regression commit (HEAD of the run).
5. Exit 1 if any flips have at least one masked run; exit 0
otherwise.
Halt-on-noise contract:
- If a recent log fetch 404s (already-pruned-via-act_runner-gc,
transient gitea-web outage): emit ``::warning::`` and treat the
run as "log unavailable" — does NOT block the flip; logged so
a curious reviewer can re-run.
- If a flipped job has ZERO recent runs on the target branch (newly
added workflow): emit ``::warning::`` "no run history to verify"
and allow the flip. This is the only way a NEW workflow can ever
ship with ``continue-on-error: false``; otherwise we'd have a
chicken-and-egg.
Behavior-based AST gate per ``feedback_behavior_based_ast_gates``:
- YAML parsed via PyYAML safe_load on BOTH sides of the diff
- No grep-by-line — formatting changes (comment churn, key order)
don't false-positive a flip
- Job-key match — so a rename ``platform-build → core-be-build``
appears as a DELETE + an ADD, not a flip (the delete side has no
new value to compare against; the add side has no base side).
Run locally (works against this repo, requires PyYAML + Gitea token
that can read combined-commit-status):
GITEA_TOKEN=... GITEA_HOST=git.moleculesai.app \\
REPO=molecule-ai/molecule-core BASE_REF=main \\
BASE_SHA=$(git rev-parse origin/main) \\
HEAD_SHA=$(git rev-parse HEAD) \\
python3 .gitea/scripts/lint_pre_flip_continue_on_error.py \\
--dry-run
Cross-links: PR#656, mc#664, PR#665 (the interim re-mask),
Quirk #10 (internal#342 + dup #287), hongming-pc2 charter §SOP-N
rule (e), feedback_strict_root_only_after_class_a,
feedback_no_shared_persona_token_use.
"""
from __future__ import annotations
import argparse
import json
import os
import subprocess
import sys
import urllib.error
import urllib.parse
import urllib.request
from typing import Any
import yaml # PyYAML 6.0.2 — installed by the workflow before this runs.
# --------------------------------------------------------------------------
# Environment (read at module-import; runtime contract enforced in main())
# --------------------------------------------------------------------------
def _env(key: str, *, default: str = "") -> str:
return os.environ.get(key, default)
GITEA_TOKEN = _env("GITEA_TOKEN")
GITEA_HOST = _env("GITEA_HOST")
REPO = _env("REPO")
BASE_REF = _env("BASE_REF", default="main")
BASE_SHA = _env("BASE_SHA")
HEAD_SHA = _env("HEAD_SHA")
# How many recent commits to scan on the target branch. 5 by default;
# enough to catch a job that only fails intermittently, not so many
# that the script paginates needlessly. Per spec.
RECENT_COMMITS_N = int(_env("RECENT_COMMITS_N", default="5"))
OWNER, NAME = (REPO.split("/", 1) + [""])[:2] if REPO else ("", "")
API = f"https://{GITEA_HOST}/api/v1" if GITEA_HOST else ""
WEB = f"https://{GITEA_HOST}" if GITEA_HOST else ""
# Failure markers we grep for in the run log.
# --- FAIL — Go test failure marker
# FAIL\s — `FAIL github.com/x/y` package-level rollup
# ::error:: — bash-step `::error::` lines (the lint-curl-status-capture
# pattern: a `python3 <<PY` block writing `::error::` then
# sys.exit(1); also any shell `echo "::error::..."` from
# jobs that wrap pytest/eslint/etc. and convert
# non-zero exits into masked-by-CoE status)
FAIL_PATTERNS = (
"--- FAIL",
"FAIL\t",
"FAIL ",
"::error::",
)
def _require_runtime_env() -> None:
for key in ("GITEA_TOKEN", "GITEA_HOST", "REPO", "BASE_REF", "BASE_SHA", "HEAD_SHA"):
if not os.environ.get(key):
sys.stderr.write(f"::error::missing required env var: {key}\n")
sys.exit(2)
# --------------------------------------------------------------------------
# Tiny HTTP helper (no requests dependency)
# Mirrors the api()/ApiError contract in ci-required-drift.py +
# main-red-watchdog.py per feedback_api_helper_must_raise_not_return_dict.
# --------------------------------------------------------------------------
class ApiError(RuntimeError):
"""Raised when a Gitea API/web call cannot be trusted to have succeeded.
Soft-failure on non-2xx is the duplicate-write bug factory in
find-or-create flows (PR #112 Five-Axis). Here it would mean a
transient gitea-web 502 silently allows a flip whose recent runs
we couldn't actually verify — exactly the regression class this
lint exists to close.
"""
def http(
method: str,
url: str,
*,
body: dict | None = None,
headers: dict[str, str] | None = None,
expect_json: bool = True,
timeout: int = 30,
) -> tuple[int, Any, bytes]:
"""Tiny HTTP helper around urllib.
Returns (status, parsed_or_None, raw_bytes). Raises ApiError on any
non-2xx response. ``expect_json=False`` returns raw bytes in the
parsed slot (for log-fetch from the web-UI which returns text/plain).
"""
final_headers = {
"Authorization": f"token {GITEA_TOKEN}",
"Accept": "application/json" if expect_json else "text/plain",
}
if headers:
final_headers.update(headers)
data = None
if body is not None:
data = json.dumps(body).encode("utf-8")
final_headers["Content-Type"] = "application/json"
req = urllib.request.Request(url, method=method, data=data, headers=final_headers)
try:
with urllib.request.urlopen(req, timeout=timeout) as resp:
raw = resp.read()
status = resp.status
except urllib.error.HTTPError as e:
raw = e.read() or b""
status = e.code
if not (200 <= status < 300):
snippet = raw[:500].decode("utf-8", errors="replace") if raw else ""
raise ApiError(f"{method} {url} → HTTP {status}: {snippet}")
if not expect_json:
return status, raw, raw
if not raw:
return status, None, raw
try:
return status, json.loads(raw), raw
except json.JSONDecodeError as e:
raise ApiError(f"{method} {url} → HTTP {status} but body is not JSON: {e}") from e
def api(method: str, path: str, *, body: dict | None = None, query: dict[str, str] | None = None) -> tuple[int, Any]:
"""Read-shaped Gitea REST helper. Path is API-relative (``/repos/...``)."""
url = f"{API}{path}"
if query:
url = f"{url}?{urllib.parse.urlencode(query)}"
status, parsed, _ = http(method, url, body=body, expect_json=True)
return status, parsed
# --------------------------------------------------------------------------
# YAML parsing — coerce truthy/falsy for continue-on-error
# --------------------------------------------------------------------------
def _coerce_coe(val: Any) -> bool:
"""Coerce a continue-on-error YAML value to bool.
PyYAML safe_load normalizes ``true``/``True``/``yes``/``on`` to
Python ``True`` and ``false``/``False``/``no``/``off`` / absence
to ``False`` (we treat absence/None as False here too — that's the
GitHub Actions default semantics).
Edge cases:
- String ``"true"`` (quoted in YAML) — kept as the string
``"true"``, falsy under bool() but a flip we DO care about
catching. Normalize string forms case-insensitively to bool
so the diff is consistent with the runtime behavior of
Gitea Actions, which YAML-parses the same way.
"""
if isinstance(val, bool):
return val
if val is None:
return False
if isinstance(val, str):
return val.strip().lower() in ("true", "yes", "on", "1")
return bool(val)
def jobs_coe_map(workflow_doc: dict) -> dict[str, bool]:
"""Return ``{job_key: continue_on_error_bool}`` for every job in
the workflow. Job-level ``continue-on-error`` only — does NOT
descend into per-step ``continue-on-error`` (step-level CoE
masking is a separate class and is handled by the test suite
+ reviewer, not by this gate — see Future Work in the workflow
YAML).
"""
out: dict[str, bool] = {}
jobs = workflow_doc.get("jobs")
if not isinstance(jobs, dict):
return out
for key, job in jobs.items():
if not isinstance(job, dict):
continue
out[key] = _coerce_coe(job.get("continue-on-error"))
return out
def workflow_name(workflow_doc: dict, *, fallback: str = "") -> str:
"""Top-level ``name:`` of the workflow. Falls back to the filename
(without extension) per Gitea Actions semantics."""
n = workflow_doc.get("name")
if isinstance(n, str) and n.strip():
return n.strip()
return fallback
def job_display_name(workflow_doc: dict, job_key: str) -> str:
"""``jobs.<key>.name`` if present, else the key. Mirrors
expected_context() in ci-required-drift.py."""
job = workflow_doc.get("jobs", {}).get(job_key)
if isinstance(job, dict):
n = job.get("name")
if isinstance(n, str) and n.strip():
return n.strip()
return job_key
def context_name(workflow_name_str: str, job_name_str: str, event: str = "push") -> str:
"""Render the commit-status context the way Gitea Actions emits it.
Default ``event="push"`` because recent-runs-on-main are push events;
callers can override to ``"pull_request"`` for PR-context lookups."""
return f"{workflow_name_str} / {job_name_str} ({event})"
# --------------------------------------------------------------------------
# Diff detection — flips, not arbitrary changes
# --------------------------------------------------------------------------
def detect_flips(
base_workflows: dict[str, str],
head_workflows: dict[str, str],
) -> list[dict]:
"""Compare per-file CoE maps; return a list of flip records.
Inputs are ``{path: yaml_text}`` for both sides. Output records
have the shape::
{
"workflow_path": ".gitea/workflows/ci.yml",
"workflow_name": "CI",
"job_key": "platform-build",
"job_name": "Platform (Go)",
"context": "CI / Platform (Go) (push)",
}
A flip is base[CoE] ∈ {True} AND head[CoE] ∈ {False}. Files
only present on one side are skipped — adding a new workflow
with ``CoE: false`` is fine (no history to mask), and removing
a workflow can't possibly flip anything.
"""
flips: list[dict] = []
for path, base_text in base_workflows.items():
if path not in head_workflows:
continue
try:
base_doc = yaml.safe_load(base_text) or {}
head_doc = yaml.safe_load(head_workflows[path]) or {}
except yaml.YAMLError as e:
# Don't block on a parse error — the YAML lint workflows
# catch invalid YAML separately. Just warn so the failing
# file is visible.
sys.stderr.write(f"::warning file={path}::YAML parse error: {e}\n")
continue
if not isinstance(base_doc, dict) or not isinstance(head_doc, dict):
continue
base_map = jobs_coe_map(base_doc)
head_map = jobs_coe_map(head_doc)
wf_name = workflow_name(head_doc, fallback=os.path.basename(path).rsplit(".", 1)[0])
for job_key, base_val in base_map.items():
if job_key not in head_map:
continue # job removed — not a flip
if base_val is True and head_map[job_key] is False:
flips.append({
"workflow_path": path,
"workflow_name": wf_name,
"job_key": job_key,
"job_name": job_display_name(head_doc, job_key),
"context": context_name(wf_name, job_display_name(head_doc, job_key), "push"),
})
return flips
# --------------------------------------------------------------------------
# Git: snapshot every .gitea/workflows/*.yml at a SHA (no checkout)
# --------------------------------------------------------------------------
def _git(*args: str, cwd: str | None = None) -> str:
"""Run ``git`` and return stdout (text)."""
result = subprocess.run(
["git", *args],
capture_output=True,
text=True,
check=False,
cwd=cwd,
)
if result.returncode != 0:
raise RuntimeError(f"git {args!r} failed: {result.stderr.strip()}")
return result.stdout
def workflows_at_sha(sha: str, *, repo_dir: str | None = None) -> dict[str, str]:
"""Read every ``.gitea/workflows/*.yml`` blob at ``sha``.
Uses ``git ls-tree`` + ``git show`` so we never need to check out
the SHA (the workflow runs on the PR head; the base SHA is
fetched, not checked out).
"""
out: dict[str, str] = {}
listing = _git("ls-tree", "-r", "--name-only", sha, ".gitea/workflows/", cwd=repo_dir)
for line in listing.splitlines():
line = line.strip()
if not line.endswith((".yml", ".yaml")):
continue
try:
blob = _git("show", f"{sha}:{line}", cwd=repo_dir)
except RuntimeError:
# Symlink or other non-blob; skip.
continue
out[line] = blob
return out
# --------------------------------------------------------------------------
# Gitea: recent commits + per-commit combined status + log fetch
# --------------------------------------------------------------------------
def recent_commits_on_branch(branch: str, n: int) -> list[str]:
"""Last `n` commit SHAs on ``branch`` (oldest→newest is fine; we
treat them as a set). Uses the REST ``/commits`` endpoint with
``sha=branch&limit=n``."""
_, body = api(
"GET",
f"/repos/{OWNER}/{NAME}/commits",
query={"sha": branch, "limit": str(n)},
)
if not isinstance(body, list):
raise ApiError(f"/commits for {branch} returned non-list: {type(body).__name__}")
out: list[str] = []
for c in body:
if isinstance(c, dict):
sha = c.get("sha") or (c.get("commit", {}) or {}).get("id")
if isinstance(sha, str) and len(sha) >= 7:
out.append(sha)
return out
def combined_status(sha: str) -> dict:
"""Combined commit status for a SHA. Same shape as
``main-red-watchdog.get_combined_status``."""
_, body = api("GET", f"/repos/{OWNER}/{NAME}/commits/{sha}/status")
if not isinstance(body, dict):
raise ApiError(f"combined-status for {sha} not a dict")
return body
def _entry_state(s: dict) -> str:
"""Per-entry state — Gitea 1.22.6 schema asymmetry: top-level
uses ``state``, per-entry uses ``status``. Defensive fallback per
main-red-watchdog.py line 233."""
return s.get("status") or s.get("state") or ""
def fetch_log(target_url: str) -> str | None:
"""Fetch a job log given its web-UI ``target_url`` (e.g.
``/molecule-ai/molecule-core/actions/runs/13494/jobs/0``).
Per ``reference_gitea_actions_log_fetch``: append ``/logs`` to the
job route. Per ``reference_gitea_1_22_6_lacks_rest_rerun_endpoints``:
Gitea 1.22.6 lacks the REST ``/api/v1/.../actions/runs/*`` path; the
web-UI route is the only working endpoint until 1.24+.
Returns the log text on success, ``None`` on 404 / log-pruned /
network error (caller treats None as "log unavailable, warn-not-fail").
"""
if not target_url:
return None
# Normalize: target_url may be relative ("/owner/repo/...") or
# absolute. Both need ``/logs`` appended to the job sub-path.
if target_url.startswith("/"):
url = f"{WEB}{target_url}"
else:
url = target_url
if not url.endswith("/logs"):
url = f"{url}/logs"
try:
_, body, _ = http("GET", url, expect_json=False, timeout=60)
except ApiError as e:
sys.stderr.write(f"::warning::log fetch failed for {url}: {e}\n")
return None
if isinstance(body, bytes):
return body.decode("utf-8", errors="replace")
return None
def grep_fail_markers(log_text: str) -> list[str]:
"""Return up to 5 sample matching lines for any FAIL_PATTERNS hit.
Empty list = clean log."""
matches: list[str] = []
for line in log_text.splitlines():
for pat in FAIL_PATTERNS:
if pat in line:
# Truncate to keep error output bounded.
matches.append(line.strip()[:240])
break
if len(matches) >= 5:
break
return matches
# --------------------------------------------------------------------------
# Verification: for one flip, scan recent runs on BASE_REF
# --------------------------------------------------------------------------
def verify_flip(flip: dict, branch: str, n: int) -> dict:
"""Scan the last ``n`` commits on ``branch``. For each commit whose
combined status contains a context matching ``flip["context"]``,
fetch the run log and grep for FAIL markers.
Returns::
{
"flip": flip,
"checked_commits": int, # how many commits had a matching context
"masked_runs": [ # runs where log shows FAIL despite status==success
{"sha": "...", "status": "success", "target_url": "...", "samples": [...]},
...
],
"fail_runs": [ # runs where status itself is failure/error
{"sha": "...", "status": "failure", "target_url": "...", "samples": [...]},
...
],
"warnings": [str], # log-unavailable warnings (not blocking)
}
Blocking condition: ``masked_runs`` OR ``fail_runs`` non-empty.
A ``success`` status with a clean log is the only "OK to flip"
outcome (per hongming-pc2 §SOP-N rule (e)).
"""
target_context = flip["context"]
result = {
"flip": flip,
"checked_commits": 0,
"masked_runs": [],
"fail_runs": [],
"warnings": [],
}
shas = recent_commits_on_branch(branch, n)
if not shas:
result["warnings"].append(
f"no recent commits on {branch} (cannot verify flip)"
)
return result
for sha in shas:
try:
status_doc = combined_status(sha)
except ApiError as e:
result["warnings"].append(f"combined-status for {sha}: {e}")
continue
statuses = status_doc.get("statuses") or []
# First entry matching the context name. Newest SHAs come
# first; one entry per context per SHA is the usual shape.
for s in statuses:
if not isinstance(s, dict):
continue
if s.get("context") != target_context:
continue
result["checked_commits"] += 1
state = _entry_state(s)
target_url = s.get("target_url") or ""
log_text = fetch_log(target_url)
if log_text is None:
result["warnings"].append(
f"log unavailable for {sha} {target_context}"
)
# Still record the status itself if it's red — that's
# a hard signal that doesn't need log access.
if state in ("failure", "error"):
result["fail_runs"].append({
"sha": sha,
"status": state,
"target_url": target_url,
"samples": ["[log unavailable; status itself is " + state + "]"],
})
break
samples = grep_fail_markers(log_text)
if state in ("failure", "error"):
result["fail_runs"].append({
"sha": sha,
"status": state,
"target_url": target_url,
"samples": samples or ["[no FAIL markers found but status is " + state + "]"],
})
elif samples and state == "success":
# The bug class: status==success while log shows FAIL.
# That's exactly Quirk #10 (continue-on-error masking).
result["masked_runs"].append({
"sha": sha,
"status": state,
"target_url": target_url,
"samples": samples,
})
# Either way, we matched one context entry for this SHA;
# don't keep looping `statuses[]`.
break
if result["checked_commits"] == 0:
result["warnings"].append(
f"no runs of {target_context!r} found in the last {n} commits on "
f"{branch} — cannot verify; allowing flip with warning"
)
return result
# --------------------------------------------------------------------------
# Report rendering
# --------------------------------------------------------------------------
def render_flip_report(verdict: dict) -> str:
flip = verdict["flip"]
lines = [
f"job: {flip['job_key']} ({flip['context']})",
f" workflow: {flip['workflow_path']}",
f" checked_commits: {verdict['checked_commits']}",
]
for run in verdict["fail_runs"]:
url = run["target_url"]
# target_url may be relative; render the absolute form for
# click-through.
if url.startswith("/"):
url = f"{WEB}{url}"
lines.append(f" fail run {run['sha'][:10]} (status={run['status']}): {url}")
for sample in run["samples"]:
lines.append(f" | {sample}")
for run in verdict["masked_runs"]:
url = run["target_url"]
if url.startswith("/"):
url = f"{WEB}{url}"
lines.append(
f" MASKED run {run['sha'][:10]} (status=success, log shows FAIL): {url}"
)
for sample in run["samples"]:
lines.append(f" | {sample}")
for w in verdict["warnings"]:
lines.append(f" warning: {w}")
return "\n".join(lines)
# --------------------------------------------------------------------------
# Main
# --------------------------------------------------------------------------
def _parse_args(argv: list[str] | None = None) -> argparse.Namespace:
p = argparse.ArgumentParser(
prog="lint-pre-flip-continue-on-error",
description="Block a PR that flips continue-on-error true→false "
"without proof recent runs are actually green.",
)
p.add_argument(
"--dry-run",
action="store_true",
help="Detect + print findings to stdout; never exit non-zero. "
"Useful for local testing.",
)
return p.parse_args(argv)
def main(argv: list[str] | None = None) -> int:
args = _parse_args(argv)
_require_runtime_env()
base_workflows = workflows_at_sha(BASE_SHA)
head_workflows = workflows_at_sha(HEAD_SHA)
flips = detect_flips(base_workflows, head_workflows)
if not flips:
print("::notice::no continue-on-error true→false flips in this PR")
return 0
print(f"::notice::detected {len(flips)} continue-on-error true→false flip(s); verifying recent runs on {BASE_REF}")
bad_flips: list[dict] = []
for flip in flips:
verdict = verify_flip(flip, BASE_REF, RECENT_COMMITS_N)
report = render_flip_report(verdict)
if verdict["fail_runs"] or verdict["masked_runs"]:
print(f"::error file={flip['workflow_path']}::flip of {flip['job_key']} "
f"({flip['context']}) blocked — recent runs on {BASE_REF} show "
f"FAIL markers OR are red. Pull each run log below + grep "
f"`--- FAIL` / `FAIL ` / `::error::` — DON'T trust the masked "
f"combined-status. See hongming-pc2 charter §SOP-N rule (e). "
f"PR#656 / mc#664 reference class.")
bad_flips.append(verdict)
else:
print(f"::notice::flip of {flip['job_key']} ({flip['context']}) is safe — "
f"{verdict['checked_commits']} recent run(s), no FAIL markers")
# Always print the per-flip detail block so the human-readable
# report is in the run log for both safe and unsafe flips.
print(f"::group::flip detail: {flip['job_key']}")
print(report)
print("::endgroup::")
if bad_flips and not args.dry_run:
print(f"::error::{len(bad_flips)}/{len(flips)} flip(s) failed pre-flip verification")
return 1
if bad_flips and args.dry_run:
print(f"::warning::[dry-run] {len(bad_flips)}/{len(flips)} flip(s) WOULD fail; exit 0 forced")
return 0
if __name__ == "__main__":
sys.exit(main())
@@ -1,526 +0,0 @@
#!/usr/bin/env python3
"""lint_required_context_exists_in_bp — Tier 2g per internal#350.
Rule
----
When a PR adds a NEW commit-status emission (a context that didn't
exist on the base side), the workflow file must carry one of three
directive comments adjacent to the new job:
(a) `# bp-required: yes`
The new context MUST already be in
`branch_protections/<branch>.status_check_contexts`. Verified
via Gitea API at PR time.
(b) `# bp-required: pending #NNN`
Acknowledged asymmetry; references an OPEN tracking issue that
will follow up with the BP PATCH.
(c) `# bp-exempt: <free-text reason>`
Informational job, not intended to be a required gate.
No directive on a new emitter → FAIL with a 3-option fix-hint.
The class this prevents
-----------------------
PR#656 added `CI / all-required (pull_request)` as a sentinel context
that workflows emit, but BP did NOT list it. When `platform-build`
failed, `all-required` failed, but BP let the PR merge anyway →
cascade to mc#664. With this lint, PR#656 would have been blocked
until either the BP PATCH ran alongside OR the author added a
`bp-required: pending` directive.
Why directives MUST live in the workflow YAML
---------------------------------------------
The directive comment lives with the emitter so a scheduled
audit (Tier 2f, daily) can read the same source. PR-body-only
directives invisibly evaporate on merge — the asymmetry would
return to undetected. PR-body claims are advisory; workflow-file
comments are the contract.
How "new emission" is detected
------------------------------
Diff base..head over `.gitea/workflows/*.yml`. For each YAML file
that's added or modified:
- Parse both base-side and head-side via PyYAML AST.
- Enumerate emitted contexts on each side using the same rules as
Tier 2f (workflow.name + job.name|key + event-mapping).
- `new_contexts = head_contexts - base_contexts`.
If `new_contexts` is empty after de-dup, no rule applies → pass.
Per `feedback_behavior_based_ast_gates`: comment scanning uses raw
text in a small window around the job-key line, NOT regex over the
full file. This avoids matching `bp-required:` mentioned in a
comment unrelated to the new job.
Exit codes
----------
0 — no new emissions, all new emissions have valid directives,
or BP read errored (graceful-degrade per Tier 2a contract).
1 — at least one new emission lacks a directive, or has
`bp-required: yes` but the context is missing from BP.
2 — env contract violation or YAML parse error.
Env
---
BASE_SHA — PR base SHA
HEAD_SHA — PR head SHA
GITEA_TOKEN — DRIFT_BOT_TOKEN (repo-admin for BP read)
GITEA_HOST — e.g. git.moleculesai.app
REPO — owner/name
BRANCH — defaults to `main`
WORKFLOWS_DIR — defaults to `.gitea/workflows`
Memory cross-links
------------------
- internal#350 (the RFC that specs this lint)
- PR#656 (the empirical case that prompted Tier 2g)
- mc#664 (the surfaced cascade)
- feedback_phantom_required_check_after_gitea_migration (Tier 2f cousin)
- feedback_behavior_based_ast_gates
"""
from __future__ import annotations
import json
import os
import re
import subprocess
import sys
import urllib.error
import urllib.parse
import urllib.request
from typing import Any
try:
import yaml
except ImportError:
sys.stderr.write(
"::error::PyYAML is required. Install with: pip install PyYAML\n"
)
sys.exit(2)
# Directive comment patterns. We match `# bp-required:` OR `# bp-exempt:`,
# both with optional surrounding whitespace and case-sensitive on the
# `bp-` prefix (convention).
BP_REQUIRED_YES_RE = re.compile(
r"#\s*bp-required:\s*yes\b", re.IGNORECASE
)
BP_REQUIRED_PENDING_RE = re.compile(
r"#\s*bp-required:\s*pending\s*#(?P<num>\d+)\b", re.IGNORECASE
)
BP_EXEMPT_RE = re.compile(
r"#\s*bp-exempt:\s*\S", re.IGNORECASE
)
# Gitea event-mapping (same as Tier 2f).
_EVENT_MAP = {
"pull_request": "pull_request",
"pull_request_target": "pull_request",
"push": "push",
}
# ---------------------------------------------------------------------------
# Env
# ---------------------------------------------------------------------------
def _env(key: str, default: str | None = None) -> str:
v = os.environ.get(key, default)
return v if v is not None else ""
def _require_env(key: str) -> str:
v = os.environ.get(key)
if not v:
sys.stderr.write(f"::error::missing required env var: {key}\n")
sys.exit(2)
return v
# ---------------------------------------------------------------------------
# API helper (same contract as Tier 2f).
# ---------------------------------------------------------------------------
def api(
method: str,
path: str,
*,
body: dict | None = None,
query: dict[str, str] | None = None,
) -> tuple[str, Any]:
host = _env("GITEA_HOST")
token = _env("GITEA_TOKEN")
url = f"https://{host}/api/v1{path}"
if query:
url = f"{url}?{urllib.parse.urlencode(query)}"
data = None
headers = {
"Authorization": f"token {token}",
"Accept": "application/json",
}
if body is not None:
data = json.dumps(body).encode("utf-8")
headers["Content-Type"] = "application/json"
req = urllib.request.Request(url, method=method, data=data, headers=headers)
try:
with urllib.request.urlopen(req, timeout=30) as resp:
raw = resp.read()
if not raw:
return ("ok", None)
return ("ok", json.loads(raw))
except urllib.error.HTTPError as e:
if e.code == 404:
return ("not_found", None)
if e.code in (401, 403):
return ("forbidden", None)
return ("error", None)
except (urllib.error.URLError, TimeoutError, json.JSONDecodeError):
return ("error", None)
# ---------------------------------------------------------------------------
# git helpers
# ---------------------------------------------------------------------------
def git_show(sha: str, path: str) -> str | None:
r = subprocess.run(
["git", "show", f"{sha}:{path}"], capture_output=True, text=True
)
if r.returncode != 0:
return None
return r.stdout
def git_diff_paths(base: str, head: str) -> list[str]:
r = subprocess.run(
["git", "diff", "--name-only", f"{base}..{head}"],
capture_output=True,
text=True,
)
if r.returncode != 0:
return []
return [p for p in r.stdout.splitlines() if p.strip()]
# ---------------------------------------------------------------------------
# Workflow context enumeration (mirror Tier 2f).
# ---------------------------------------------------------------------------
def _get_on(d: Any) -> Any:
if not isinstance(d, dict):
return None
if "on" in d:
return d["on"]
if True in d:
return d[True]
return None
def _on_events(doc: Any) -> set[str]:
on = _get_on(doc)
raw: set[str] = set()
if on is None:
return raw
if isinstance(on, str):
raw.add(on)
elif isinstance(on, list):
for e in on:
if isinstance(e, str):
raw.add(e)
elif isinstance(on, dict):
for k in on:
if isinstance(k, str):
raw.add(k)
return {_EVENT_MAP[e] for e in raw if e in _EVENT_MAP}
def _job_display(jbody: dict, jkey: str) -> str:
n = jbody.get("name") if isinstance(jbody, dict) else None
if isinstance(n, str) and n:
return n
return jkey
def workflow_contexts(doc: Any) -> set[str]:
if not isinstance(doc, dict):
return set()
wf_name = doc.get("name")
if not isinstance(wf_name, str) or not wf_name:
return set()
events = _on_events(doc)
if not events:
return set()
jobs = doc.get("jobs")
if not isinstance(jobs, dict):
return set()
out: set[str] = set()
for jkey, jbody in jobs.items():
if jkey == "__lines__":
continue
if not isinstance(jbody, dict):
continue
disp = _job_display(jbody, jkey)
for ev in events:
out.add(f"{wf_name} / {disp} ({ev})")
return out
# ---------------------------------------------------------------------------
# Find the source line of a job-key in a workflow YAML's raw text.
# Used to scan for nearby directive comments.
# ---------------------------------------------------------------------------
def _find_job_key_line(raw_lines: list[str], jkey: str) -> int | None:
"""Return 1-based line of `<jkey>:` under jobs:."""
in_jobs = False
jobs_indent = -1
for i, line in enumerate(raw_lines, start=1):
stripped = line.lstrip()
if stripped.startswith("jobs:"):
in_jobs = True
jobs_indent = len(line) - len(stripped)
continue
if in_jobs:
# Job key is the next indent level under `jobs:`.
indent = len(line) - len(stripped)
if stripped and indent <= jobs_indent:
# Left the jobs: block
in_jobs = False
continue
if re.match(rf"^\s*{re.escape(jkey)}\s*:", line):
return i
return None
_DIRECTIVE_WINDOW = 3 # lines above the job-key line (inclusive)
def find_directive_for_job(
raw_text: str, jkey: str
) -> tuple[str, str | None] | None:
"""Return (kind, value) tuple for the first directive in a small
window above the job-key line.
kind ∈ {"required-yes", "required-pending", "exempt"}.
value is the pending-issue number for required-pending, else None.
Returns None if no directive found.
We scan ABOVE the line only (the convention is the directive
precedes the job — matches how `# mc#NNN` comments are placed
above `continue-on-error: true`). We don't scan inside the job
body because steps can produce false positives.
"""
lines = raw_text.splitlines()
line_no = _find_job_key_line(lines, jkey)
if line_no is None:
return None
lo = max(1, line_no - _DIRECTIVE_WINDOW)
for i in range(lo, line_no):
line = lines[i - 1]
m = BP_REQUIRED_PENDING_RE.search(line)
if m:
return ("required-pending", m.group("num"))
if BP_REQUIRED_YES_RE.search(line):
return ("required-yes", None)
if BP_EXEMPT_RE.search(line):
return ("exempt", None)
return None
# ---------------------------------------------------------------------------
# Map a context back to its emitting (workflow_path, job_key) pair so
# we know WHERE to look for the directive comment.
# ---------------------------------------------------------------------------
def _resolve_emitter(
ctx: str, head_workflows: dict[str, tuple[str, Any]]
) -> tuple[str, str] | None:
"""Return (file_path, job_key) emitting ctx, or None."""
m = re.match(r"^(?P<wf>.+?) / (?P<job>.+) \((?P<event>[^)]+)\)$", ctx)
if not m:
return None
target_wf = m.group("wf")
target_job_disp = m.group("job")
for path, (_raw, doc) in head_workflows.items():
if not isinstance(doc, dict):
continue
if doc.get("name") != target_wf:
continue
jobs = doc.get("jobs") or {}
if not isinstance(jobs, dict):
continue
for jkey, jbody in jobs.items():
if jkey == "__lines__":
continue
if not isinstance(jbody, dict):
continue
disp = _job_display(jbody, jkey)
if disp == target_job_disp:
return (path, jkey)
return None
# ---------------------------------------------------------------------------
# Driver
# ---------------------------------------------------------------------------
def run() -> int:
base_sha = _require_env("BASE_SHA")
head_sha = _require_env("HEAD_SHA")
_require_env("GITEA_TOKEN")
_require_env("GITEA_HOST")
repo = _require_env("REPO")
branch = _env("BRANCH", "main")
wf_dir = _env("WORKFLOWS_DIR", ".gitea/workflows")
# Step 1 — find workflow files changed in the PR.
changed = git_diff_paths(base_sha, head_sha)
changed_workflows = [
p
for p in changed
if p.startswith(wf_dir + "/")
and (p.endswith(".yml") or p.endswith(".yaml"))
]
if not changed_workflows:
print(
"::notice::no workflow file changes in this PR; "
"lint-required-context-exists-in-bp skipped."
)
return 0
# Step 2 — load base+head + compute new contexts.
head_workflows: dict[str, tuple[str, Any]] = {}
new_contexts: set[str] = set()
for path in changed_workflows:
base_raw = git_show(base_sha, path)
head_raw = git_show(head_sha, path)
if head_raw is None:
# File deleted on head — no new emission contribution.
continue
try:
head_doc = yaml.safe_load(head_raw)
except yaml.YAMLError as e:
sys.stderr.write(
f"::error file={path}::YAML parse error on head: {e}\n"
)
return 2
head_workflows[path] = (head_raw, head_doc)
head_ctx = workflow_contexts(head_doc)
base_ctx: set[str] = set()
if base_raw is not None:
try:
base_doc = yaml.safe_load(base_raw)
except yaml.YAMLError:
base_doc = None
if base_doc is not None:
base_ctx = workflow_contexts(base_doc)
new_contexts |= (head_ctx - base_ctx)
if not new_contexts:
print(
"::notice::no new context emissions detected in this PR; "
"lint-required-context-exists-in-bp skipped."
)
return 0
# Step 3 — fetch BP context list.
status, bp = api("GET", f"/repos/{repo}/branch_protections/{branch}")
bp_contexts: set[str] = set()
if status == "forbidden":
sys.stderr.write(
f"::error::GET branch_protections/{branch} returned HTTP 403 — "
f"DRIFT_BOT_TOKEN lacks repo-admin scope. Cannot verify "
f"bp-required directives; skipping lint with exit 0 per "
f"Tier 2a contract. Fix the token, not the lint.\n"
)
return 0
elif status == "not_found":
# Branch has no protection — nothing to verify against; the
# bp-required: yes directive can't be satisfied. Treat as
# graceful-skip rather than red-X.
print(
f"::notice::branch '{branch}' has no protection; cannot verify "
f"bp-required directives. Skipping (exit 0)."
)
return 0
elif status == "ok" and isinstance(bp, dict):
bp_contexts = set(bp.get("status_check_contexts") or [])
else:
sys.stderr.write(
f"::error::branch_protections/{branch} response unexpected; "
f"status={status}. Treating as transient; exit 0.\n"
)
return 0
# Step 4 — validate each new emission's directive.
violations: list[str] = []
for ctx in sorted(new_contexts):
emitter = _resolve_emitter(ctx, head_workflows)
if emitter is None:
# Shouldn't happen — we just derived ctx from head_workflows.
# Belt-and-suspenders fallback.
violations.append(
f"::error::new emission '{ctx}' (could not resolve emitter "
f"file/job — bug in lint?)"
)
continue
file_path, jkey = emitter
raw_text, _ = head_workflows[file_path]
directive = find_directive_for_job(raw_text, jkey)
if directive is None:
violations.append(
f"::error file={file_path}::lint-required-context-exists-in-bp "
f"(Tier 2g): NEW emission `{ctx}` (job '{jkey}') has no "
f"directive comment. Add ONE of these comments on the line "
f"directly above `{jkey}:` (within {_DIRECTIVE_WINDOW} lines):\n"
f" - `# bp-required: yes` — and ensure the context is "
f"already in branch_protections/{branch}.status_check_contexts.\n"
f" - `# bp-required: pending #NNN` — acknowledged asymmetry, "
f"references the tracking issue for the BP PATCH.\n"
f" - `# bp-exempt: <reason>` — informational job, not a gate.\n"
f"Memory: internal#350 (PR#656 + mc#664 empirical case)."
)
continue
kind, value = directive
if kind == "exempt":
print(f"::notice::{ctx}: bp-exempt directive present, OK.")
continue
if kind == "required-pending":
print(
f"::notice::{ctx}: bp-required: pending #{value}"
f"acknowledged asymmetry, OK."
)
continue
if kind == "required-yes":
if ctx in bp_contexts:
print(
f"::notice::{ctx}: bp-required: yes, and context is in "
f"BP, OK."
)
else:
violations.append(
f"::error file={file_path}::lint-required-context-exists-in-bp "
f"(Tier 2g): job '{jkey}' has `bp-required: yes` "
f"directive but its emitted context `{ctx}` is NOT in "
f"`branch_protections/{branch}.status_check_contexts`. "
f"FIX: either (a) add `{ctx}` to BP (Owners-tier PATCH), "
f"or (b) downgrade the directive to "
f"`# bp-required: pending #NNN` referencing the tracker "
f"for the pending BP PATCH."
)
if violations:
print(
f"::error::lint-required-context-exists-in-bp: "
f"{len(violations)} violation(s) across "
f"{len(changed_workflows)} changed workflow file(s)."
)
for v in violations:
print(v)
return 1
print(
f"::notice::lint-required-context-exists-in-bp: "
f"{len(new_contexts)} new emission(s) all directive-validated."
)
return 0
if __name__ == "__main__":
sys.exit(run())
@@ -1,505 +0,0 @@
"""Unit tests for .gitea/scripts/lint_pre_flip_continue_on_error.py.
These tests pin the pure-logic surface (flip detection + per-flip
verdict aggregation) without making real HTTP calls. The end-to-end
git ls-tree + Gitea API path is exercised by running the workflow
against real PRs.
Run locally::
python3 -m unittest .gitea/scripts/tests/test_lint_pre_flip_continue_on_error.py -v
Mirrors the pattern in scripts/ops/test_check_migration_collisions.py
+ scripts/test_build_runtime_package.py.
"""
from __future__ import annotations
import importlib.util
import os
import sys
import unittest
from pathlib import Path
from unittest import mock
# Load the script as a module without invoking main(). Tests must NOT
# depend on the full runtime env contract (GITEA_TOKEN etc.), so we
# import individual functions and stub the network surface explicitly.
SCRIPT_PATH = Path(__file__).resolve().parent.parent / "lint_pre_flip_continue_on_error.py"
spec = importlib.util.spec_from_file_location("lpfc", SCRIPT_PATH)
lpfc = importlib.util.module_from_spec(spec)
spec.loader.exec_module(lpfc)
# --------------------------------------------------------------------------
# Fixtures: minimal valid workflow YAML on each side of a "diff"
# --------------------------------------------------------------------------
CI_YML_BASE = """\
name: CI
on:
push:
branches: [main]
jobs:
platform-build:
name: Platform (Go)
runs-on: ubuntu-latest
continue-on-error: true
steps:
- run: echo platform
canvas-build:
name: Canvas (Next.js)
runs-on: ubuntu-latest
continue-on-error: true
steps:
- run: echo canvas
all-required:
runs-on: ubuntu-latest
continue-on-error: true
needs: [platform-build, canvas-build]
steps:
- run: echo ok
"""
CI_YML_HEAD_FLIPPED = """\
name: CI
on:
push:
branches: [main]
jobs:
platform-build:
name: Platform (Go)
runs-on: ubuntu-latest
continue-on-error: false
steps:
- run: echo platform
canvas-build:
name: Canvas (Next.js)
runs-on: ubuntu-latest
continue-on-error: false
steps:
- run: echo canvas
all-required:
runs-on: ubuntu-latest
continue-on-error: true
needs: [platform-build, canvas-build]
steps:
- run: echo ok
"""
CI_YML_HEAD_NO_DIFF = CI_YML_BASE # identical to base, no flip
# --------------------------------------------------------------------------
# 1. CoE coercion (truthy/falsy/quoted/absent)
# --------------------------------------------------------------------------
class TestCoerceCoE(unittest.TestCase):
def test_python_bool_true(self):
self.assertTrue(lpfc._coerce_coe(True))
def test_python_bool_false(self):
self.assertFalse(lpfc._coerce_coe(False))
def test_none_is_false(self):
# GitHub Actions default: absent == false.
self.assertFalse(lpfc._coerce_coe(None))
def test_string_true_lowercase(self):
# Quoted "true" in YAML — Gitea Actions normalizes to True.
self.assertTrue(lpfc._coerce_coe("true"))
def test_string_True_titlecase(self):
self.assertTrue(lpfc._coerce_coe("True"))
def test_string_yes(self):
# YAML 1.1 truthy form.
self.assertTrue(lpfc._coerce_coe("yes"))
def test_string_false(self):
self.assertFalse(lpfc._coerce_coe("false"))
def test_string_random_falsy(self):
# An unrecognized string is treated as falsy — safer than
# silently coercing "maybe" to True and false-positiving a
# flip.
self.assertFalse(lpfc._coerce_coe("maybe"))
# --------------------------------------------------------------------------
# 2. Diff detection — flips, not arbitrary changes
# --------------------------------------------------------------------------
class TestDetectFlips(unittest.TestCase):
def test_no_flip_in_diff_passes(self):
# Acceptance test #1: PR doesn't flip continue-on-error → 0 flips.
flips = lpfc.detect_flips(
{".gitea/workflows/ci.yml": CI_YML_BASE},
{".gitea/workflows/ci.yml": CI_YML_HEAD_NO_DIFF},
)
self.assertEqual(flips, [])
def test_flip_detected_in_one_file(self):
flips = lpfc.detect_flips(
{".gitea/workflows/ci.yml": CI_YML_BASE},
{".gitea/workflows/ci.yml": CI_YML_HEAD_FLIPPED},
)
# Two jobs flipped: platform-build, canvas-build. all-required
# is still true on both sides.
self.assertEqual(len(flips), 2)
keys = sorted(f["job_key"] for f in flips)
self.assertEqual(keys, ["canvas-build", "platform-build"])
def test_context_name_render(self):
flips = lpfc.detect_flips(
{".gitea/workflows/ci.yml": CI_YML_BASE},
{".gitea/workflows/ci.yml": CI_YML_HEAD_FLIPPED},
)
platform = next(f for f in flips if f["job_key"] == "platform-build")
self.assertEqual(platform["context"], "CI / Platform (Go) (push)")
self.assertEqual(platform["workflow_name"], "CI")
def test_context_falls_back_to_job_key_when_no_name(self):
base = "name: WF\njobs:\n foo:\n continue-on-error: true\n runs-on: x\n steps: []\n"
head = "name: WF\njobs:\n foo:\n continue-on-error: false\n runs-on: x\n steps: []\n"
flips = lpfc.detect_flips({"a.yml": base}, {"a.yml": head})
self.assertEqual(len(flips), 1)
self.assertEqual(flips[0]["context"], "WF / foo (push)")
def test_no_flip_when_only_one_side_has_file(self):
# Newly added workflow file — head has CoE:false, base has no
# file. Adding a new workflow with CoE:false is fine; there's
# nothing to mask.
flips = lpfc.detect_flips(
{}, # base has no workflow files
{".gitea/workflows/new.yml": CI_YML_HEAD_FLIPPED},
)
self.assertEqual(flips, [])
def test_no_flip_when_job_removed(self):
# Job exists on base, not on head — a removal, not a flip.
head = """\
name: CI
jobs:
canvas-build:
name: Canvas (Next.js)
continue-on-error: true
runs-on: ubuntu-latest
steps: []
"""
flips = lpfc.detect_flips(
{".gitea/workflows/ci.yml": CI_YML_BASE},
{".gitea/workflows/ci.yml": head},
)
self.assertEqual(flips, [])
def test_no_flip_when_job_added_with_false(self):
# New job on head with CoE:false — no base side; not a flip.
head_with_new = CI_YML_BASE.replace(
" all-required:",
" newjob:\n name: New Job\n continue-on-error: false\n"
" runs-on: x\n steps: []\n"
" all-required:",
)
flips = lpfc.detect_flips(
{".gitea/workflows/ci.yml": CI_YML_BASE},
{".gitea/workflows/ci.yml": head_with_new},
)
self.assertEqual(flips, [])
def test_yaml_parse_error_warns_not_raises(self):
# Malformed YAML on head — should warn (stderr) and skip,
# not raise.
bad_head = "name: CI\njobs:\n :::\n"
# Capture stderr so the test isn't noisy.
with mock.patch.object(sys, "stderr"):
flips = lpfc.detect_flips(
{".gitea/workflows/ci.yml": CI_YML_BASE},
{".gitea/workflows/ci.yml": bad_head},
)
self.assertEqual(flips, [])
# --------------------------------------------------------------------------
# 3. grep_fail_markers — the regex / substring matcher
# --------------------------------------------------------------------------
class TestGrepFailMarkers(unittest.TestCase):
def test_clean_log_returns_empty(self):
log = "===== test run starting =====\nPASS\nok example.com/foo 1.234s\n"
self.assertEqual(lpfc.grep_fail_markers(log), [])
def test_go_minus_minus_minus_fail_caught(self):
log = "ok example.com/foo 1.234s\n--- FAIL: TestBar (0.01s)\n bar_test.go:42:\n"
matches = lpfc.grep_fail_markers(log)
self.assertEqual(len(matches), 1)
self.assertIn("FAIL: TestBar", matches[0])
def test_go_package_fail_caught(self):
log = "FAIL\texample.com/baz\t1.234s\n"
matches = lpfc.grep_fail_markers(log)
self.assertEqual(len(matches), 1)
self.assertIn("FAIL", matches[0])
def test_bash_error_directive_caught(self):
# `lint-curl-status-capture` pattern: a python heredoc inside a
# bash step that prints `::error::` then sys.exit(1). With
# continue-on-error:true the job rolls up as success despite
# this line. THAT's the masking we're trying to catch.
log = "Running scan...\n::error::Found 3 curl-status-capture pollution site(s):\n"
matches = lpfc.grep_fail_markers(log)
self.assertEqual(len(matches), 1)
self.assertIn("::error::", matches[0])
def test_caps_matches_at_max_5(self):
log = "\n".join(["--- FAIL: T%d" % i for i in range(20)])
matches = lpfc.grep_fail_markers(log)
self.assertEqual(len(matches), 5)
# --------------------------------------------------------------------------
# 4. verify_flip — single-flip verdict assembly (network surface stubbed)
# --------------------------------------------------------------------------
def _stub_status(context: str, state: str, target_url: str = "/owner/repo/actions/runs/1/jobs/0") -> dict:
"""Build a single-context combined-status response."""
return {
"state": state,
"statuses": [
{"context": context, "status": state, "target_url": target_url, "description": ""}
],
}
FLIP_FIXTURE = {
"workflow_path": ".gitea/workflows/ci.yml",
"workflow_name": "CI",
"job_key": "platform-build",
"job_name": "Platform (Go)",
"context": "CI / Platform (Go) (push)",
}
class TestVerifyFlip(unittest.TestCase):
def test_flip_with_clean_history_passes(self):
# Acceptance test #2: flip detected, last 5 runs clean → exit 0.
with mock.patch.object(lpfc, "recent_commits_on_branch", return_value=["sha1", "sha2", "sha3"]):
with mock.patch.object(
lpfc, "combined_status",
side_effect=[_stub_status(FLIP_FIXTURE["context"], "success") for _ in range(3)],
):
with mock.patch.object(lpfc, "fetch_log", return_value="ok example.com/foo 1s\nPASS\n"):
verdict = lpfc.verify_flip(FLIP_FIXTURE, "main", 5)
self.assertEqual(verdict["fail_runs"], [])
self.assertEqual(verdict["masked_runs"], [])
self.assertEqual(verdict["checked_commits"], 3)
self.assertEqual(verdict["warnings"], [])
def test_flip_with_recent_fail_blocks(self):
# Acceptance test #3: flip detected, recent run has --- FAIL → exit 1.
# Setup: 3 commits, the most recent run's log shows --- FAIL
# but the STATUS is success (Quirk #10 mask). That's the
# masked_runs case.
log_with_fail = "ok example.com/foo 1s\n--- FAIL: TestSqlmock (0.01s)\n sqlmock_test.go:42:\n"
with mock.patch.object(lpfc, "recent_commits_on_branch", return_value=["sha1", "sha2", "sha3"]):
with mock.patch.object(
lpfc, "combined_status",
side_effect=[_stub_status(FLIP_FIXTURE["context"], "success") for _ in range(3)],
):
with mock.patch.object(lpfc, "fetch_log", side_effect=[log_with_fail, "PASS\n", "PASS\n"]):
verdict = lpfc.verify_flip(FLIP_FIXTURE, "main", 5)
self.assertEqual(len(verdict["masked_runs"]), 1)
self.assertEqual(verdict["masked_runs"][0]["sha"], "sha1")
self.assertTrue(any("TestSqlmock" in s for s in verdict["masked_runs"][0]["samples"]))
self.assertEqual(verdict["fail_runs"], [])
def test_red_status_alone_blocks(self):
# Status itself is `failure` — block without needing log
# markers. (Belt-and-braces: even with a clean log, a `failure`
# status means the job's exit code was non-zero.)
with mock.patch.object(lpfc, "recent_commits_on_branch", return_value=["sha1"]):
with mock.patch.object(
lpfc, "combined_status",
return_value=_stub_status(FLIP_FIXTURE["context"], "failure"),
):
with mock.patch.object(lpfc, "fetch_log", return_value="some unrelated text\n"):
verdict = lpfc.verify_flip(FLIP_FIXTURE, "main", 5)
self.assertEqual(len(verdict["fail_runs"]), 1)
self.assertEqual(verdict["fail_runs"][0]["status"], "failure")
def test_unreadable_log_warns_not_blocks(self):
# Acceptance test #5: log fetch 404 (None) → warn, not block.
# Status is `success`, log is None — we can't tell, so we warn
# and allow.
with mock.patch.object(lpfc, "recent_commits_on_branch", return_value=["sha1"]):
with mock.patch.object(
lpfc, "combined_status",
return_value=_stub_status(FLIP_FIXTURE["context"], "success"),
):
with mock.patch.object(lpfc, "fetch_log", return_value=None):
verdict = lpfc.verify_flip(FLIP_FIXTURE, "main", 5)
self.assertEqual(verdict["fail_runs"], [])
self.assertEqual(verdict["masked_runs"], [])
self.assertTrue(any("log unavailable" in w for w in verdict["warnings"]))
def test_unreadable_log_with_failure_status_still_blocks(self):
# Edge case: log fetch fails BUT the status itself is `failure`.
# We can still block — the status alone is sufficient signal,
# we don't need the log to confirm.
with mock.patch.object(lpfc, "recent_commits_on_branch", return_value=["sha1"]):
with mock.patch.object(
lpfc, "combined_status",
return_value=_stub_status(FLIP_FIXTURE["context"], "failure"),
):
with mock.patch.object(lpfc, "fetch_log", return_value=None):
verdict = lpfc.verify_flip(FLIP_FIXTURE, "main", 5)
self.assertEqual(len(verdict["fail_runs"]), 1)
self.assertIn("log unavailable", verdict["fail_runs"][0]["samples"][0])
def test_zero_runs_history_warns_allows(self):
# No commits with a matching context — newly added workflow.
# Allow with warning.
with mock.patch.object(lpfc, "recent_commits_on_branch", return_value=["sha1", "sha2"]):
with mock.patch.object(
lpfc, "combined_status",
return_value={"state": "success", "statuses": []}, # no matching context
):
verdict = lpfc.verify_flip(FLIP_FIXTURE, "main", 5)
self.assertEqual(verdict["checked_commits"], 0)
self.assertEqual(verdict["fail_runs"], [])
self.assertEqual(verdict["masked_runs"], [])
self.assertTrue(any("no runs of" in w for w in verdict["warnings"]))
def test_zero_commits_warns_allows(self):
# Empty branch (newly created repo, e.g.). Allow with warning.
with mock.patch.object(lpfc, "recent_commits_on_branch", return_value=[]):
verdict = lpfc.verify_flip(FLIP_FIXTURE, "main", 5)
self.assertEqual(verdict["checked_commits"], 0)
self.assertEqual(verdict["fail_runs"], [])
self.assertEqual(verdict["masked_runs"], [])
self.assertTrue(any("no recent commits" in w for w in verdict["warnings"]))
# --------------------------------------------------------------------------
# 5. Multiple-flip aggregation in main()
# --------------------------------------------------------------------------
class TestMainAggregation(unittest.TestCase):
"""Tests that `main()` aggregates multiple flips and exits 1 when
ANY one of them has a masked or red recent run. Acceptance test #4.
We stub at the verify_flip + workflows_at_sha + _require_runtime_env
boundary so we don't need real git or HTTP.
"""
def setUp(self):
# The actual env values are irrelevant — _require_runtime_env
# is stubbed out — but the module reads OWNER/NAME at import
# time. Patch the runtime env contract to a no-op for the
# duration of each test.
self._patches = [
mock.patch.object(lpfc, "_require_runtime_env", return_value=None),
mock.patch.object(lpfc, "BASE_REF", "main"),
mock.patch.object(lpfc, "BASE_SHA", "deadbeefcafe"),
mock.patch.object(lpfc, "HEAD_SHA", "feedfaceabad"),
mock.patch.object(lpfc, "RECENT_COMMITS_N", 5),
]
for p in self._patches:
p.start()
self.addCleanup(lambda: [p.stop() for p in self._patches])
def test_multiple_flips_aggregated_one_bad_blocks(self):
# PR flips 3 jobs; 1 has a recent fail → exit 1, naming that job.
flips = [
{"workflow_path": ".gitea/workflows/ci.yml", "workflow_name": "CI",
"job_key": "platform-build", "job_name": "Platform (Go)",
"context": "CI / Platform (Go) (push)"},
{"workflow_path": ".gitea/workflows/ci.yml", "workflow_name": "CI",
"job_key": "canvas-build", "job_name": "Canvas (Next.js)",
"context": "CI / Canvas (Next.js) (push)"},
{"workflow_path": ".gitea/workflows/ci.yml", "workflow_name": "CI",
"job_key": "python-lint", "job_name": "Python Lint & Test",
"context": "CI / Python Lint & Test (push)"},
]
clean = {"flip": flips[0], "checked_commits": 5, "masked_runs": [],
"fail_runs": [], "warnings": []}
bad = {"flip": flips[1], "checked_commits": 5,
"masked_runs": [{"sha": "abc1234567", "status": "success",
"target_url": "/x/y/actions/runs/1/jobs/0",
"samples": ["--- FAIL: TestSqlmock"]}],
"fail_runs": [], "warnings": []}
also_clean = {"flip": flips[2], "checked_commits": 5, "masked_runs": [],
"fail_runs": [], "warnings": []}
with mock.patch.object(lpfc, "workflows_at_sha", return_value={}):
with mock.patch.object(lpfc, "detect_flips", return_value=flips):
with mock.patch.object(lpfc, "verify_flip",
side_effect=[clean, bad, also_clean]):
# Capture stdout to assert on naming.
captured = []
with mock.patch("builtins.print", side_effect=lambda *a, **k: captured.append(" ".join(str(x) for x in a))):
rc = lpfc.main([])
self.assertEqual(rc, 1)
# The blocking error message must name the failing job.
joined = "\n".join(captured)
self.assertIn("canvas-build", joined)
# And it must mention the empirical class so a reviewer can
# cross-link the right RFC.
self.assertTrue("mc#664" in joined or "PR#656" in joined)
def test_no_flips_in_diff_exits_zero(self):
# Acceptance test #1 at main() level: empty flips → exit 0.
with mock.patch.object(lpfc, "workflows_at_sha", return_value={}):
with mock.patch.object(lpfc, "detect_flips", return_value=[]):
rc = lpfc.main([])
self.assertEqual(rc, 0)
def test_all_flips_clean_exits_zero(self):
flips = [{"workflow_path": ".gitea/workflows/ci.yml", "workflow_name": "CI",
"job_key": "platform-build", "job_name": "Platform (Go)",
"context": "CI / Platform (Go) (push)"}]
clean = {"flip": flips[0], "checked_commits": 5, "masked_runs": [],
"fail_runs": [], "warnings": []}
with mock.patch.object(lpfc, "workflows_at_sha", return_value={}):
with mock.patch.object(lpfc, "detect_flips", return_value=flips):
with mock.patch.object(lpfc, "verify_flip", return_value=clean):
rc = lpfc.main([])
self.assertEqual(rc, 0)
def test_dry_run_forces_exit_zero_even_with_bad_flip(self):
# --dry-run never fails, even when verification finds masked runs.
flips = [{"workflow_path": ".gitea/workflows/ci.yml", "workflow_name": "CI",
"job_key": "platform-build", "job_name": "Platform (Go)",
"context": "CI / Platform (Go) (push)"}]
bad = {"flip": flips[0], "checked_commits": 5,
"masked_runs": [{"sha": "abc1234567", "status": "success",
"target_url": "/x/y/actions/runs/1/jobs/0",
"samples": ["--- FAIL: TestSqlmock"]}],
"fail_runs": [], "warnings": []}
with mock.patch.object(lpfc, "workflows_at_sha", return_value={}):
with mock.patch.object(lpfc, "detect_flips", return_value=flips):
with mock.patch.object(lpfc, "verify_flip", return_value=bad):
rc = lpfc.main(["--dry-run"])
self.assertEqual(rc, 0)
# --------------------------------------------------------------------------
# 6. Context-name rendering (the format Gitea Actions actually emits)
# --------------------------------------------------------------------------
class TestContextName(unittest.TestCase):
def test_push_event(self):
self.assertEqual(
lpfc.context_name("CI", "Platform (Go)", "push"),
"CI / Platform (Go) (push)",
)
def test_pull_request_event(self):
self.assertEqual(
lpfc.context_name("CI", "Platform (Go)", "pull_request"),
"CI / Platform (Go) (pull_request)",
)
def test_workflow_name_falls_back_to_filename(self):
# No top-level `name:` → falls back to filename minus extension.
doc = {"jobs": {"foo": {"continue-on-error": True}}}
self.assertEqual(
lpfc.workflow_name(doc, fallback="my-workflow"),
"my-workflow",
)
if __name__ == "__main__":
unittest.main()
@@ -37,7 +37,6 @@ jobs:
# Phase 3 (RFC #219 §1): surface broken workflows without blocking
# the PR. Follow-up PR flips this off after surfaced defects are
# triaged.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
@@ -48,7 +48,6 @@ jobs:
# Phase 3 (RFC #219 §1): surface broken workflows without blocking
# the PR. Follow-up PR flips this off after surfaced defects are
# triaged.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
steps:
- uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
@@ -45,7 +45,6 @@ jobs:
# Phase 3 (RFC #219 §1): surface broken workflows without blocking
# the PR. Follow-up PR flips this off after surfaced defects are
# triaged.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
timeout-minutes: 5
steps:
+13 -29
View File
@@ -126,7 +126,7 @@ jobs:
name: Platform (Go)
needs: changes
runs-on: ubuntu-latest
# mc#774 (interim): re-mask platform-build pending fix-forward. Phase 4
# mc#664 (interim): re-mask platform-build pending fix-forward. Phase 4
# (#656) flipped this to continue-on-error: false based on a Phase-3-masked
# "green on main 2026-05-12" — the prior continue-on-error: true had
# been hiding failing tests in workspace-server/internal/handlers/.
@@ -145,11 +145,10 @@ jobs:
# Time-boxed Option A (90 min) did not fit the cross-cutting scope.
# This is a sequenced revert→fix→reflip per
# feedback_strict_root_only_after_class_a emergency clause — NOT
# a permanent re-mask. Re-flip blocked on mc#774 fix-forward landing.
# a permanent re-mask. Re-flip blocked on mc#664 fix-forward landing.
# Other 4 #656 flips (changes, canvas-build, shellcheck, python-lint)
# retain continue-on-error: false; only platform-build regresses.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true # mc#774 fix-forward in flight; re-flip when mc#774 lands (PR #669 → rebase after #709)
continue-on-error: true # mc#664 fix-forward in flight; re-flip when tests pass
defaults:
run:
working-directory: workspace-server
@@ -169,10 +168,10 @@ jobs:
run: go build ./cmd/server
# CLI (molecli) moved to standalone repo: git.moleculesai.app/molecule-ai/molecule-cli
- if: needs.changes.outputs.platform == 'true'
run: go vet ./...
run: go vet ./... || true
- if: needs.changes.outputs.platform == 'true'
name: Run golangci-lint
run: golangci-lint run --timeout 3m ./...
run: golangci-lint run --timeout 3m ./... || true
- if: needs.changes.outputs.platform == 'true'
name: Diagnostic — per-package verbose 60s
run: |
@@ -187,7 +186,6 @@ jobs:
echo "::group::pendinguploads exit=$pu_exit (last 100 lines)"
tail -100 /tmp/test-pu.log
echo "::endgroup::"
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
- if: needs.changes.outputs.platform == 'true'
name: Run tests with race detection and coverage
@@ -374,7 +372,6 @@ jobs:
canvas-deploy-reminder:
name: Canvas Deploy Reminder
runs-on: ubuntu-latest
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
needs: [changes, canvas-build]
# Only fires on direct pushes to main (i.e. after staging→main promotion).
@@ -538,16 +535,12 @@ jobs:
# explicitly excludes `github.event_name`-gated jobs from F1 (see
# `.gitea/scripts/ci-required-drift.py::ci_job_names`).
#
# Phase 3 (RFC #219 §1) safety: underlying build jobs carry
# continue-on-error: true so their failures are masked to null (2026-05-12: re-enabled mc#774 interim)
# (Gitea suppresses status reporting for CoE jobs). This sentinel
# runs with continue-on-error: false so it always reports its
# result to the API — without this, the required-status entry
# (CI / all-required (pull_request)) is never created, which
# blocks PR merges. When Phase 3 ends, flip underlying jobs to
# continue-on-error: false; this sentinel can then be flipped to
# continue-on-error: true if a Phase-4 regression requires it.
continue-on-error: false
# Phase 3 (RFC #219 §1) safety: continue-on-error here so the sentinel
# does not hard-fail and block PRs while the underlying build jobs are
# still in Phase 3 (continue-on-error: true suppresses their status to null).
# When Phase 3 ends (defects fixed, continue-on-error flipped off on build
# jobs), remove continue-on-error here so the sentinel again hard-fails.
continue-on-error: true
runs-on: ubuntu-latest
timeout-minutes: 1
needs:
@@ -571,26 +564,17 @@ jobs:
echo "$results" | python3 -c '
import json, sys
ns = json.load(sys.stdin)
# Phase 3 masked: jobs with continue-on-error: true may report "failure"
# Remove when mc#774 handler test failures are resolved.
PHASE3_MASKED = {"platform-build"}
# Exclude null (Phase 3 suppressed / in-flight) from the bad list.
bad = [(k, v.get("result")) for k, v in ns.items()
if v.get("result") not in ("success", None, "cancelled", "skipped") and k not in PHASE3_MASKED]
if v.get("result") not in ("success", None)]
if bad:
print(f"FAIL: jobs not green:", file=sys.stderr)
for k, r in bad:
print(f" - {k}: {r}", file=sys.stderr)
sys.exit(1)
pending = [(k, v.get("result")) for k, v in ns.items()
if v.get("result") is None]
cancelled = [(k, v.get("result")) for k, v in ns.items()
if v.get("result") == "cancelled"]
pending = [(k, v.get("result")) for k, v in ns.items() if v.get("result") is None]
if pending:
print(f"WARN: {len(pending)} job(s) still in-flight (result=null): " +
", ".join(k for k, _ in pending), file=sys.stderr)
if cancelled:
print(f"INFO: {len(cancelled)} job(s) masked by continue-on-error: " +
", ".join(k for k, _ in cancelled), file=sys.stderr)
print(f"OK: all {len(ns)} required jobs succeeded (or Phase-3 suppressed)")
'
@@ -90,7 +90,6 @@ jobs:
name: Synthetic E2E against staging
runs-on: ubuntu-latest
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
# Bumped from 12 → 20 (2026-05-04). Tenant user-data install phase
# (apt-get update + install docker.io/jq/awscli/caddy + snap install
+2 -17
View File
@@ -103,7 +103,6 @@ jobs:
detect-changes:
runs-on: ubuntu-latest
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
outputs:
api: ${{ steps.decide.outputs.api }}
@@ -155,7 +154,6 @@ jobs:
name: E2E API Smoke Test
runs-on: ubuntu-latest
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
timeout-minutes: 15
env:
@@ -166,6 +164,7 @@ jobs:
# we let Docker assign an ephemeral host port.
PG_CONTAINER: pg-e2e-api-${{ github.run_id }}-${{ github.run_attempt }}
REDIS_CONTAINER: redis-e2e-api-${{ github.run_id }}-${{ github.run_attempt }}
PORT: "8080"
steps:
- name: No-op pass (paths filter excluded this commit)
if: needs.detect-changes.outputs.api != 'true'
@@ -269,20 +268,6 @@ jobs:
if: needs.detect-changes.outputs.api == 'true'
working-directory: workspace-server
run: go build -o platform-server ./cmd/server
- name: Pick platform port
if: needs.detect-changes.outputs.api == 'true'
run: |
PLATFORM_PORT=$(python3 - <<'PY'
import socket
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.bind(("127.0.0.1", 0))
print(s.getsockname()[1])
PY
)
echo "PORT=${PLATFORM_PORT}" >> "$GITHUB_ENV"
echo "BASE=http://127.0.0.1:${PLATFORM_PORT}" >> "$GITHUB_ENV"
echo "Platform host port: ${PLATFORM_PORT}"
- name: Start platform (background)
if: needs.detect-changes.outputs.api == 'true'
working-directory: workspace-server
@@ -295,7 +280,7 @@ jobs:
if: needs.detect-changes.outputs.api == 'true'
run: |
for i in $(seq 1 30); do
if curl -sf "$BASE/health" > /dev/null; then
if curl -sf http://127.0.0.1:8080/health > /dev/null; then
echo "Platform up after ${i}s"
exit 0
fi
-2
View File
@@ -70,7 +70,6 @@ jobs:
detect-changes:
runs-on: ubuntu-latest
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
outputs:
canvas: ${{ steps.decide.outputs.canvas }}
@@ -119,7 +118,6 @@ jobs:
name: Canvas tabs E2E
runs-on: ubuntu-latest
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
timeout-minutes: 40
@@ -84,7 +84,6 @@ jobs:
name: E2E Staging External Runtime
runs-on: ubuntu-latest
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
timeout-minutes: 25
-4
View File
@@ -88,20 +88,17 @@ jobs:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 1
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
- uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
with:
python-version: "3.11"
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
- name: YAML validation (best-effort)
run: |
echo "e2e-staging-saas.yml — PR validation: workflow YAML is valid."
echo "E2E step runs only when provisioning-critical files change."
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
# Actual E2E: runs on trunk pushes (main + staging). NOT the PR-fire-only
@@ -112,7 +109,6 @@ jobs:
# Only runs on trunk pushes. PR paths get pr-validate instead.
if: github.event.pull_request.base.ref == ''
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
timeout-minutes: 45
permissions:
-1
View File
@@ -37,7 +37,6 @@ jobs:
name: Intentional-failure teardown sanity
runs-on: ubuntu-latest
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
timeout-minutes: 20
+13 -29
View File
@@ -32,21 +32,12 @@ on:
# iterating all open PRs when PR_NUMBER is empty.
workflow_dispatch:
permissions:
# read: contents — for checkout (base ref, not PR head for security)
# read: pull-requests — for reading PR info via API
# write: pull-requests — for posting/updating gate-check comments
# Without this the token cannot POST/PATCH /issues/comments → 403.
contents: read
pull-requests: write
env:
GITHUB_SERVER_URL: https://git.moleculesai.app
jobs:
gate-check:
runs-on: ubuntu-latest
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true # Never block on our own detector failing
steps:
- name: Check out BASE ref (never PR-head under pull_request_target)
@@ -77,32 +68,25 @@ jobs:
if: github.event_name == 'schedule'
env:
GITEA_TOKEN: ${{ secrets.SOP_TIER_CHECK_TOKEN || secrets.GITHUB_TOKEN }}
REPO: ${{ github.repository }}
run: |
set -euo pipefail
# Fetch all open PRs and run gate-check on each
# socket.setdefaulttimeout(15): defence-in-depth for missing SOP_TIER_CHECK_TOKEN.
# gate_check.py uses timeout=15 on every urlopen call; this catches the
# inline Python polling loop too (issue #603).
pr_numbers=$(python3 <<'PY'
import json
import os
import socket
import urllib.request
socket.setdefaulttimeout(15)
token = os.environ["GITEA_TOKEN"]
repo = os.environ["REPO"]
req = urllib.request.Request(
f"https://git.moleculesai.app/api/v1/repos/{repo}/pulls?state=open&limit=100",
headers={"Authorization": f"token {token}", "Accept": "application/json"},
)
with urllib.request.urlopen(req) as r:
prs = json.loads(r.read())
for pr in prs:
print(pr["number"])
PY
)
pr_numbers=$(python3 -c "
import socket, urllib.request, json, os
socket.setdefaulttimeout(15)
token = os.environ['GITEA_TOKEN']
req = urllib.request.Request(
'https://git.moleculesai.app/api/v1/repos/${{ github.repository }}/pulls?state=open&limit=100',
headers={'Authorization': f'token {token}', 'Accept': 'application/json'}
)
with urllib.request.urlopen(req) as r:
prs = json.loads(r.read())
for pr in prs:
print(pr['number'])
")
for pr in $pr_numbers; do
echo "Checking PR #$pr..."
python3 tools/gate-check-v3/gate_check.py \
@@ -78,8 +78,7 @@ jobs:
detect-changes:
name: detect-changes
runs-on: ubuntu-latest
# mc#774 Phase 3 (RFC §1): surface broken workflows without blocking.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
continue-on-error: true
outputs:
handlers: ${{ steps.filter.outputs.handlers }}
@@ -119,8 +118,7 @@ jobs:
name: Handlers Postgres Integration
needs: detect-changes
runs-on: ubuntu-latest
# mc#774 Phase 3 (RFC §1): surface broken workflows without blocking.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
continue-on-error: true
env:
# Unique name per run so concurrent jobs don't collide on the
-2
View File
@@ -63,7 +63,6 @@ jobs:
detect-changes:
runs-on: ubuntu-latest
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
outputs:
run: ${{ steps.decide.outputs.run }}
@@ -155,7 +154,6 @@ jobs:
name: Harness Replays
runs-on: ubuntu-latest
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
timeout-minutes: 30
steps:
@@ -1,120 +0,0 @@
name: lint-bp-context-emit-match
# Tier 2f scheduled lint (per mc#774) — detects drift between
# `branch_protections/<branch>.status_check_contexts` and the set of
# contexts emitted by `.gitea/workflows/*.yml`.
#
# Rule
# ----
# For each protected branch context (Source A — BP), there must exist
# at least one emitting workflow + job pair (Source B — workflow YAML
# + on:-event mapping) whose runtime status-name maps to it. The
# inverse direction (emitter without BP context) is informational
# only — Tier 2g handles that at PR-time.
#
# Why this exists
# ---------------
# A BP-required context with no emitter blocks merges forever — Gitea
# 1.22.6 treats absent-as-`pending`, NOT absent-as-`skipped`. The
# phantom-required-check class previously surfaced as
# `feedback_phantom_required_check_after_gitea_migration` (a port
# kept the GitHub context name after rename to Gitea, but no
# workflow emitted under the new name).
#
# This lint catches the same class structurally + a forward case:
# workflow renamed/deleted while still in BP.
#
# Scope
# -----
# Scheduled daily. We DON'T run on `pull_request` because (a) the
# emitter side moves with PR diffs (transitional state false-flags)
# and (b) Tier 2g handles emitter-side drift at PR-time.
#
# Cross-repo
# ----------
# Today this runs only on molecule-core/main. Per internal#349
# (cross-repo BP sweep) Class-D repos will get the same lint after
# their BP rollouts.
#
# Auth
# ----
# `GET /repos/.../branch_protections/{branch}` requires repo-admin
# role on Gitea 1.22.6. We use DRIFT_BOT_TOKEN (same persona as
# ci-required-drift.yml — `internal#329` provisioning trail).
# Graceful-degrade per Tier 2a contract: 403/404 → exit 0 with
# ::error::.
#
# Idempotency
# -----------
# The drift issue is filed with title prefix
# `[ci-bp-drift] {repo}/{branch}: BP→emitter mismatch`. The script
# searches OPEN issues for an exact title-prefix match and PATCHes
# the existing issue (if any) instead of POSTing a duplicate.
# Mirrors `ci-required-drift.py`'s contract.
#
# Phase contract (RFC internal#219 §1 ladder)
# -------------------------------------------
# Lands at `continue-on-error: true` (Phase 3). After 7 days of clean
# scheduled runs on `main`, flip to `false` so a scheduled failure
# becomes a hard CI signal.
#
# Cross-links
# -----------
# - mc#774 (the RFC that specs this lint)
# - internal#349 (cross-repo BP sweep)
# - feedback_phantom_required_check_after_gitea_migration
# - feedback_tier_label_ids_are_per_repo
# - ci-required-drift.yml (F2 detector, narrower-scope sibling)
on:
schedule:
# Daily at 03:31 UTC — off-peak, prime-staggered from other
# scheduled jobs (ci-required-drift :00 hourly, lint-coe-tracking
# 13:11). At 03:31 the CI fleet is quietest in EMEA hours.
- cron: '31 3 * * *'
workflow_dispatch:
# No `push` / `pull_request` here — Tier 2g owns PR-time drift.
env:
GITHUB_SERVER_URL: https://git.moleculesai.app
permissions:
contents: read
issues: write # needed to file/edit the drift issue
concurrency:
group: lint-bp-context-emit-match-${{ github.ref }}
cancel-in-progress: true
jobs:
lint:
name: lint-bp-context-emit-match
runs-on: ubuntu-latest
timeout-minutes: 5
# Phase 3 (RFC #219 §1): surface drift without blocking. After 7
# clean scheduled runs on main, flip to false so a scheduled
# failure is a hard CI signal.
continue-on-error: true # mc#774 Phase 3 — flip to false after 7 clean main runs
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5.6.0
with:
python-version: '3.12'
- name: Install PyYAML
run: python -m pip install --quiet 'PyYAML==6.0.2'
- name: Run lint-bp-context-emit-match
env:
# DRIFT_BOT_TOKEN — repo-admin on this repo (internal#329
# provisioning trail). Required for branch_protections read.
GITEA_TOKEN: ${{ secrets.DRIFT_BOT_TOKEN }}
GITEA_HOST: git.moleculesai.app
REPO: ${{ github.repository }}
BRANCH: main
WORKFLOWS_DIR: .gitea/workflows
DRIFT_LABEL: ci-bp-drift
GITHUB_RUN_URL: https://git.moleculesai.app/${{ github.repository }}/actions/runs/${{ github.run_id }}
run: python3 .gitea/scripts/lint_bp_context_emit_match.py
- name: Run lint-bp-context-emit-match unit tests
run: |
python -m pip install --quiet pytest
python3 -m pytest tests/test_lint_bp_context_emit_match.py -v
@@ -1,6 +1,6 @@
name: lint-continue-on-error-tracking
# Tier 2e hard-gate lint (per mc#774) — every
# Tier 2e hard-gate lint (per internal#350) — every
# `continue-on-error: true` in `.gitea/workflows/*.yml` must carry a
# `# mc#NNNN` or `# internal#NNNN` tracker comment within 2 lines,
# the referenced issue must be OPEN, and ≤14 days old.
@@ -8,7 +8,7 @@ name: lint-continue-on-error-tracking
# Why this exists
# ---------------
# `continue-on-error: true` on `platform-build` had been hiding
# mc#774-class regressions for ~3 weeks before #656 surfaced them on
# mc#664-class regressions for ~3 weeks before #656 surfaced them on
# 2026-05-12. A 14-day cap on tracker age forces a review cycle and
# surfaces mask-drift within at most 14 days of the original defect.
# Each `continue-on-error: true` gets a paper trail — close or renew.
@@ -45,12 +45,12 @@ name: lint-continue-on-error-tracking
# close-and-flip, or document the deliberate keep-mask in a fresh
# 14-day-renewable tracker. After main is clean for 3 days,
# follow-up PR flips this workflow's continue-on-error to false.
# Tracking: mc#774.
# Tracking: internal#350.
#
# Cross-links
# -----------
# - mc#774 (the RFC that specs this lint)
# - mc#774 (the empirical masked-3-weeks case)
# - internal#350 (the RFC that specs this lint)
# - mc#664 (the empirical masked-3-weeks case)
# - feedback_chained_defects_in_never_tested_workflows
# - feedback_behavior_based_ast_gates
# - feedback_strict_root_only_after_class_a
@@ -96,9 +96,8 @@ jobs:
# Phase 3 (RFC #219 §1): surface masked defects without blocking
# PRs. Pre-existing continue-on-error: true directives on main
# all violate this lint at first — intentional. Flip to false
# follow-up after main is clean for 3 days. mc#774.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true # mc#774 Phase 3 mask — 14d forced-renewal cadence
# follow-up after main is clean for 3 days. internal#350.
continue-on-error: true
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5.6.0
+54 -10
View File
@@ -30,16 +30,10 @@ name: Lint curl status-code capture
on:
pull_request:
paths:
- '.gitea/workflows/**'
- '.gitea/scripts/lint-curl-status-capture.py'
- 'tests/test_lint_curl_status_capture.py'
paths: ['.gitea/workflows/**']
push:
branches: [main, staging]
paths:
- '.gitea/workflows/**'
- '.gitea/scripts/lint-curl-status-capture.py'
- 'tests/test_lint_curl_status_capture.py'
paths: ['.gitea/workflows/**']
env:
GITHUB_SERVER_URL: https://git.moleculesai.app
@@ -51,10 +45,60 @@ jobs:
# Phase 3 (RFC #219 §1): surface broken workflows without blocking
# the PR. Follow-up PR flips this off after surfaced defects are
# triaged.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: Find curl ... -w '%{http_code}' ... || echo "000" subshells
run: |
python3 .gitea/scripts/lint-curl-status-capture.py
set -uo pipefail
# Multi-line aware: look for `$(curl ... -w '%{http_code}' ... || echo "000")`
# subshell where the entire command-substitution wraps a curl that
# ends with `|| echo "000"`. Must distinguish from the SAFE shape
# `$(cat tempfile 2>/dev/null || echo "000")` — `cat` with a missing
# tempfile produces empty stdout, no pollution.
python3 <<'PY'
import os, re, sys, glob
BAD_FILES = []
# Match the buggy substitution across newlines: $(curl ... -w '%{http_code}' ... || echo "000")
# The `\\n` is the bash line-continuation that lets curl flags span lines.
# We collapse continuation lines first, then look for the single-line bad pattern.
PATTERN = re.compile(
r'\$\(\s*curl\b[^)]*-w\s*[\'"]%\{http_code\}[\'"][^)]*\|\|\s*echo\s+"000"\s*\)',
re.DOTALL,
)
# Self-skip: this lint workflow contains the literal anti-pattern in
# its own docstring — that's intentional, not a bug.
SELF = ".gitea/workflows/lint-curl-status-capture.yml"
for f in sorted(glob.glob(".gitea/workflows/*.yml")):
if f == SELF:
continue
with open(f) as fh:
content = fh.read()
# Collapse bash line-continuations (\\\n + leading whitespace)
# into a single logical line so the regex can see the full
# curl invocation as one chunk.
flat = re.sub(r'\\\s*\n\s*', ' ', content)
for m in PATTERN.finditer(flat):
BAD_FILES.append((f, m.group(0)[:120]))
if not BAD_FILES:
print("OK No curl-status-capture pollution patterns detected")
sys.exit(0)
print(f"::error::Found {len(BAD_FILES)} curl-status-capture pollution site(s):")
for f, snippet in BAD_FILES:
print(f"::error file={f}::Curl status-capture pollution: '|| echo \"000\"' inside a $(curl ... -w '%{{http_code}}' ...) subshell. On non-2xx or connection failure, curl's -w writes a status, then exits non-zero, then the || echo appends another '000' — producing 'HTTP 000000' or '409000' that fails comparisons silently. Fix: route -w into a tempfile so the exit code can't pollute stdout. See memory feedback_curl_status_capture_pollution.md.")
print(f" matched: {snippet}...")
print()
print("Fix template:")
print(' set +e')
print(' curl ... -w \'%{http_code}\' >code.txt 2>/dev/null')
print(' set -e')
print(' HTTP_CODE=$(cat code.txt 2>/dev/null)')
print(' [ -z "$HTTP_CODE" ] && HTTP_CODE="000"')
sys.exit(1)
PY
+5 -6
View File
@@ -1,6 +1,6 @@
name: lint-mask-pr-atomicity
# Tier 2d hard-gate lint (per mc#774) — blocks PRs that touch
# Tier 2d hard-gate lint (per internal#350) — blocks PRs that touch
# `.gitea/workflows/ci.yml` and modify ONLY ONE of {continue-on-error,
# all-required.sentinel.needs} without a `Paired: #NNN` reference in
# the PR body or in a commit message.
@@ -37,13 +37,13 @@ name: lint-mask-pr-atomicity
# This workflow lands at `continue-on-error: true` (Phase 3 — surface
# regressions without blocking PRs while the rule beds in).
# Follow-up PR flips to `false` once we have ≥3 days of clean runs on
# `main` and no false-positives. Tracking issue: mc#774.
# `main` and no false-positives. Tracking issue: internal#350.
#
# Cross-links
# -----------
# - mc#774 (the RFC that specs this lint)
# - internal#350 (the RFC that specs this lint)
# - PR#665 / PR#668 (the empirical split-pair)
# - mc#774 (the main-red incident the split caused)
# - mc#664 (the main-red incident the split caused)
# - feedback_strict_root_only_after_class_a
# - feedback_behavior_based_ast_gates
#
@@ -91,8 +91,7 @@ jobs:
# Phase 3 (RFC #219 §1): surface broken shapes without blocking
# PRs. Follow-up PR flips this to `false` once recent runs on main
# are confirmed clean (eat-our-own-dogfood discipline mirrors
# PR#673's same-shape comment). Tracking: mc#774.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
# PR#673's same-shape comment). Tracking: internal#350.
continue-on-error: true
steps:
- name: Check out PR head with full history (need base SHA blobs)
@@ -1,141 +0,0 @@
name: Lint pre-flip continue-on-error
# Pre-merge gate: blocks PRs that flip `continue-on-error: true → false`
# on any job in `.gitea/workflows/*.yml` WITHOUT proof that the affected
# job's recent runs on the target branch (PR base) are actually green.
#
# Empirical class: PR #656 / mc#774. PR #656 (RFC internal#219 Phase 4)
# flipped 5 platform-build-class jobs `continue-on-error: true → false`
# on the basis of a "verified green on main via combined-status check".
# But that "green" was the LIE the prior `continue-on-error: true`
# produced: Gitea Quirk #10 (internal#342 + dup #287) — a failed step
# inside a `continue-on-error: true` job rolls up to a `success`
# job-level status. The precondition the PR claimed to verify was
# structurally fooled by the bug being flipped.
#
# mc#774 captured the surfaced defects (2 mutually-masked regressions):
# - Class 1: sqlmock helper drift since 2f36bb9a (24 days old)
# - Class 2: OFFSEC-001 contract collision since 7d1a189f (1 day old)
#
# Codified 04:35Z as hongming-pc2 charter §SOP-N rule (e)
# "run-log-grep-before-flip" — now structurally enforced here at PR
# time, ahead of merge.
#
# How the gate works:
# 1. Read every `.gitea/workflows/*.yml` at the PR base SHA AND at
# the PR head SHA via `git show <sha>:<path>` (no checkout
# needed).
# 2. Parse both sides via PyYAML AST (NOT grep — per
# `feedback_behavior_based_ast_gates`). Walk `jobs.<key>.
# continue-on-error` on each side. A flip is base=true,
# head=false.
# 3. For each flipped job, render the commit-status context as
# `"{workflow.name} / {job.name or job.key} (push)"` — that's
# how Gitea Actions emits the per-context status on `main`/
# `staging` runs.
# 4. Pull last 5 commits on the PR base branch, fetch combined
# commit-status per commit, scan for the target context. For
# each match, fetch the run log via the web-UI route
# `{server_url}/{repo}/actions/runs/{run_id}/jobs/{job_idx}/logs`
# (per `reference_gitea_actions_log_fetch` —
# Gitea 1.22.6 lacks REST `/actions/runs/*`; web-UI is the
# only working path, see also
# `reference_gitea_1_22_6_lacks_rest_rerun_endpoints`).
# 5. Grep each log for `--- FAIL`, `FAIL\s`, `::error::`. If
# the status is `success` but the log shows any of these,
# the job was masked. Block the PR with `::error::`.
#
# Graceful-degrade contract (per task halt-conditions):
# - Log fetch 404 (act_runner pruned the log, transient outage):
# emit `::warning::` "log unavailable" — does NOT block.
# - Zero recent runs of the flipped job's context on the base
# branch (newly added workflow): emit `::warning::` "no run
# history to verify" — allow the flip. Chicken-and-egg
# exemption.
# - YAML parse error in one of the workflow files: warn-only,
# don't block — the YAML lint workflows catch this separately.
#
# Cross-links: PR#656, mc#774, PR#665 (interim re-mask),
# Quirk #10 (internal#342 + dup #287), hongming-pc2 charter
# §SOP-N rule (e), feedback_strict_root_only_after_class_a,
# feedback_no_shared_persona_token_use.
#
# Phase contract (RFC internal#219 §1 ladder):
# - This workflow lands at `continue-on-error: true` (Phase 3 —
# surface defects without blocking). Follow-up PR flips it to
# `false` ONLY after this workflow's own recent runs on `main`
# are confirmed clean — exactly the discipline the workflow
# itself enforces. Eat your own dogfood.
on:
pull_request:
types: [opened, synchronize, reopened]
paths:
- '.gitea/workflows/**'
- '.gitea/scripts/lint_pre_flip_continue_on_error.py'
- '.gitea/workflows/lint-pre-flip-continue-on-error.yml'
env:
# Per `feedback_act_runner_github_server_url` — without this,
# actions/checkout and friends default to github.com → break.
GITHUB_SERVER_URL: https://git.moleculesai.app
permissions:
contents: read
# Need read on the API to pull combined commit-status + commit list
# for the base branch. The job-log fetch uses the same token via
# the web-UI route (Gitea 1.22.6 accepts `Authorization: token ...`
# there).
pull-requests: read
concurrency:
group: lint-pre-flip-coe-${{ github.event.pull_request.head.sha || github.sha }}
cancel-in-progress: true
jobs:
scan:
name: Verify continue-on-error flips have run-log proof
runs-on: ubuntu-latest
timeout-minutes: 8
# Phase 3 (RFC internal#219 §1): surface broken flips without blocking
# the PR yet. Follow-up flips this to `false` once the workflow itself
# has clean recent runs on main. mc#774 interim — remove when CoE→false.
continue-on-error: true # mc#774
steps:
- name: Check out PR head (full history for base-SHA access)
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
# `git show <base-sha>:<path>` needs the base SHA's blobs.
# Shallow=1 would miss it. Same rationale as
# check-migration-collisions.yml.
fetch-depth: 0
- name: Set up Python (PyYAML for AST parsing)
uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5.6.0
with:
python-version: '3.12'
- name: Install PyYAML
# Same pin as ci-required-drift.yml — keep dependencies
# uniform so a Gitea runner cache hits across both jobs.
run: python -m pip install --quiet 'PyYAML==6.0.2'
- name: Ensure base ref is reachable locally
# `actions/checkout@v6 fetch-depth=0` usually pulls the base
# too, but explicit-fetch is cheap insurance against the
# form-of-ref differences across Gitea runner versions
# (mirrors the comment in check-migration-collisions.yml).
run: |
git fetch origin "${{ github.event.pull_request.base.ref }}" || true
- name: Run lint
env:
# Auto-injected by Gitea Actions; sufficient scope for
# combined-status + commit-list + log fetch via web-UI
# route. NO repo-admin needed (unlike the
# branch_protections endpoint).
GITEA_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GITEA_HOST: git.moleculesai.app
REPO: ${{ github.repository }}
BASE_REF: ${{ github.event.pull_request.base.ref }}
BASE_SHA: ${{ github.event.pull_request.base.sha }}
HEAD_SHA: ${{ github.event.pull_request.head.sha }}
# Last 5 commits on the base branch is the spec default.
RECENT_COMMITS_N: '5'
run: python3 .gitea/scripts/lint_pre_flip_continue_on_error.py
@@ -1,118 +0,0 @@
name: lint-required-context-exists-in-bp
# Tier 2g hard-gate lint (per mc#774) — diff-based PR-time
# check. When a PR adds a NEW commit-status emission (workflow YAML
# `name:` + job `name:`-or-key + on:-event), the workflow file must
# carry one of three directives adjacent to the new job:
#
# - `# bp-required: yes` — and BP must list the context
# - `# bp-required: pending #NNN` — acknowledged asymmetry + tracker
# - `# bp-exempt: <reason>` — informational job, not a gate
#
# Default (no directive on a new emitter) = FAIL.
#
# Why this exists
# ---------------
# PR#656 added `CI / all-required (pull_request)` as a sentinel
# context that workflows emit, but BP did NOT list it. When
# platform-build failed, all-required failed, but BP let the PR
# merge anyway → cascade to mc#774. With this lint, PR#656 would
# have been blocked until either the BP PATCH ran alongside OR
# the author added a `bp-required: pending` directive.
#
# Tier 2g vs Tier 2f
# ------------------
# Tier 2g runs at PR-time (diff-based) and BLOCKS the merge.
# Tier 2f runs daily (scheduled) and FILES a drift issue. They
# share the workflow-context enumeration helpers
# (`_event_map`, `workflow_contexts`, `_job_display`) but the
# semantics are intentionally distinct so they're separate scripts.
# Co-design is documented in mc#774.
#
# Directive comment lives in the workflow file (NOT PR body)
# ----------------------------------------------------------
# A PR-body claim of "BP exempt" evaporates on merge — the
# asymmetry returns to undetected state and Tier 2f's daily
# scheduled audit can't see it. The directive must live with the
# emitter so both PR-time (Tier 2g) and post-merge (Tier 2f)
# readers consume the same source.
#
# Phase contract (RFC internal#219 §1 ladder)
# -------------------------------------------
# Lands at `continue-on-error: true` (Phase 3 — surface the
# pattern without blocking PRs while the directive convention
# beds in). After 7 days of clean runs on `main` with no false
# positives, follow-up flips to `false`. Tracking: mc#774.
#
# Cross-links
# -----------
# - mc#774 (the RFC that specs this lint)
# - PR#656 (the empirical case)
# - mc#774 (the surfaced cascade)
# - feedback_phantom_required_check_after_gitea_migration (Tier 2f cousin)
# - feedback_behavior_based_ast_gates
#
# Auth: DRIFT_BOT_TOKEN (repo-admin for branch_protections read).
on:
pull_request:
types: [opened, synchronize, reopened]
paths:
- '.gitea/workflows/**'
- '.gitea/scripts/lint_required_context_exists_in_bp.py'
- '.gitea/workflows/lint-required-context-exists-in-bp.yml'
- 'tests/test_lint_required_context_exists_in_bp.py'
env:
GITHUB_SERVER_URL: https://git.moleculesai.app
permissions:
contents: read
concurrency:
group: lint-required-context-exists-in-bp-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
# bp-exempt: this lint is a PR-time advisory and is not intended to
# be a required gate on main. The directive eat-our-own-dogfood
# confirms the convention works on the lint that defines it.
lint:
name: lint-required-context-exists-in-bp
runs-on: ubuntu-latest
timeout-minutes: 5
# Phase 3 (RFC #219 §1): surface the pattern without blocking PRs
# while the directive convention beds in. Follow-up flip to false
# after 7 clean days on main. mc#774.
continue-on-error: true # mc#774 Phase 3 — flip to false after 7 clean main runs
steps:
- name: Check out PR head with full history (need base SHA blobs)
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
# `git show <base-sha>:<path>` needs the base SHA's blobs.
# Same rationale as PR#673 and check-migration-collisions.yml.
fetch-depth: 0
- uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5.6.0
with:
python-version: '3.12'
- name: Install PyYAML
run: python -m pip install --quiet 'PyYAML==6.0.2'
- name: Ensure base ref is reachable locally
# Cheap insurance against runner-version drift.
run: |
git fetch origin "${{ github.event.pull_request.base.ref }}" || true
- name: Run lint-required-context-exists-in-bp
env:
# DRIFT_BOT_TOKEN — repo-admin (needed for branch_protections).
GITEA_TOKEN: ${{ secrets.DRIFT_BOT_TOKEN }}
GITEA_HOST: git.moleculesai.app
REPO: ${{ github.repository }}
BRANCH: main
BASE_SHA: ${{ github.event.pull_request.base.sha }}
HEAD_SHA: ${{ github.event.pull_request.head.sha }}
WORKFLOWS_DIR: .gitea/workflows
run: python3 .gitea/scripts/lint_required_context_exists_in_bp.py
- name: Run lint-required-context-exists-in-bp unit tests
run: |
python -m pip install --quiet pytest
python3 -m pytest tests/test_lint_required_context_exists_in_bp.py -v
-1
View File
@@ -55,7 +55,6 @@ jobs:
# Phase 3 (RFC #219 §1): surface broken shapes without blocking PRs.
# Follow-up PR flips this off after the 4 existing-on-main rule-2
# (workflow_run) violations are migrated to a supported trigger.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
+23 -42
View File
@@ -9,12 +9,18 @@ name: publish-canvas-image
# - Workflow-level env.GITHUB_SERVER_URL pinned per
# feedback_act_runner_github_server_url.
# - `continue-on-error: true` on each job (RFC §1 contract).
# - Retargeted the image push from GHCR to ECR. GHCR was retired during
# the 2026-05-06 Gitea migration, and Gitea's GITHUB_TOKEN cannot
# authenticate to ghcr.io.
# - **Open question for review**: this workflow pushes the canvas
# image to `ghcr.io`. GHCR was retired during the 2026-05-06
# Gitea migration in favor of ECR (per staging-verify.yml header
# notes). The image may not be consumable post-migration. Two
# options for follow-up: (a) retarget to
# `153263036946.dkr.ecr.us-east-2.amazonaws.com/molecule-ai/canvas`,
# or (b) retire this workflow entirely and route canvas deploys
# via the operator-host build path. tier:low + continue-on-error
# means failed pushes do not block PRs.
#
# Builds and pushes the canvas Docker image to ECR whenever a commit lands
# Builds and pushes the canvas Docker image to GHCR whenever a commit lands
# on main that touches canvas code. Previously canvas changes were visible in
# CI (npm run build passed) but the live container was never updated —
# operators had to manually run `docker compose build canvas` each time.
@@ -39,10 +45,10 @@ on:
permissions:
contents: read
packages: write
packages: write # required to push to ghcr.io/${{ github.repository_owner }}/*
env:
IMAGE_NAME: 153263036946.dkr.ecr.us-east-2.amazonaws.com/molecule-ai/canvas
IMAGE_NAME: ghcr.io/molecule-ai/canvas
GITHUB_SERVER_URL: https://git.moleculesai.app
jobs:
@@ -56,43 +62,21 @@ jobs:
# See issue #576 + infra-lead pulse ~00:30Z.
runs-on: ubuntu-latest
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
steps:
- name: Checkout
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: Log in to ECR
env:
IMAGE_NAME: ${{ env.IMAGE_NAME }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_DEFAULT_REGION: us-east-2
run: |
set -euo pipefail
ECR_REGISTRY="${IMAGE_NAME%%/*}"
aws ecr get-login-password --region us-east-2 | \
docker login --username AWS --password-stdin "${ECR_REGISTRY}"
- name: Log in to GHCR
uses: docker/login-action@c94ce9fb468520275223c153574b00df6fe4bcc9 # v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@4d04d5d9486b7bd6fa91e7baf45bbb4f8b9deedd # v4.0.0
- name: Ensure ECR repository exists
env:
IMAGE_NAME: ${{ env.IMAGE_NAME }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_DEFAULT_REGION: us-east-2
run: |
set -euo pipefail
repo_path="${IMAGE_NAME#*/}"
if ! aws ecr describe-repositories --repository-names "${repo_path}" --region us-east-2 >/dev/null 2>&1; then
aws ecr create-repository \
--repository-name "${repo_path}" \
--image-scanning-configuration scanOnPush=true \
--region us-east-2 >/dev/null
fi
# Health check: verify Docker daemon is accessible before attempting any
# build steps. This fails loudly at step 1 when the runner's docker.sock
# is inaccessible rather than silently continuing to the build step
@@ -102,14 +86,12 @@ jobs:
set -euo pipefail
echo "::group::Docker daemon health check"
echo "Runner: ${HOSTNAME:-unknown}"
docker_info="$(docker info 2>&1)" || {
docker info 2>&1 | head -5 || {
echo "::error::Docker daemon is not accessible at /var/run/docker.sock"
echo "::error::Runner: ${HOSTNAME:-unknown}"
printf '%s\n' "${docker_info}"
echo "::error::Check: (1) daemon running, (2) runner user in docker group, (3) sock perms 660+"
exit 1
}
printf '%s\n' "${docker_info}" | sed -n '1,5p'
echo "Docker daemon OK"
echo "::endgroup::"
@@ -143,7 +125,7 @@ jobs:
echo "platform_url=${PLATFORM_URL}" >> "$GITHUB_OUTPUT"
echo "ws_url=${WS_URL}" >> "$GITHUB_OUTPUT"
- name: Build & push canvas image to ECR
- name: Build & push canvas image to GHCR
uses: docker/build-push-action@bcafcacb16a39f128d818304e6c9c0c18556b85f # v7.1.0
with:
context: ./canvas
@@ -156,10 +138,9 @@ jobs:
tags: |
${{ env.IMAGE_NAME }}:latest
${{ env.IMAGE_NAME }}:sha-${{ steps.tags.outputs.sha }}
# Gitea artifact-cache reachability is best-effort on the operator
# runner network. Do not let cache export fail an image that already
# built and pushed successfully.
cache-from: type=gha
cache-to: type=gha,mode=max
labels: |
org.opencontainers.image.source=https://git.moleculesai.app/${{ github.repository }}
org.opencontainers.image.source=https://github.com/${{ github.repository }}
org.opencontainers.image.revision=${{ github.sha }}
org.opencontainers.image.description=Molecule AI canvas (Next.js 15 + React Flow)
@@ -55,7 +55,6 @@ jobs:
# The actual bump work happens on the main/staging push after merge.
pr-validate:
runs-on: ubuntu-latest
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true # do not block PR merge on operational failures
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
@@ -20,12 +20,6 @@ name: publish-workspace-server-image
#
# ECR target: 153263036946.dkr.ecr.us-east-2.amazonaws.com/molecule-ai/*
# Required secrets: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AUTO_SYNC_TOKEN
#
# mc#711: Docker daemon not accessible on ubuntu-latest runner (molecule-canonical-1
# shows client-only in `docker info` — daemon not running). DinD mount is present but
# daemon doesn't respond. Fix: add diagnostic step showing socket info so ops can
# identify which runners have a live daemon. If no daemon is available, the job
# fails fast with actionable output rather than silent deep failure.
on:
push:
@@ -58,25 +52,36 @@ env:
jobs:
build-and-push:
# REVERTED (infra/revert-docker-runner-label): `runs-on: ubuntu-latest` restored.
# The `docker` label is not registered on any act_runner. `runs-on: [ubuntu-latest, docker]`
# causes jobs to queue indefinitely with zero eligible runners — strictly worse than the
# pre-#599 coin-flip (50% success rate). Once the `docker` label is registered on
# ≥2 runners, re-apply the fix from #599 (infra/docker-runner-label).
# See issue #576 + infra-lead pulse ~00:30Z.
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: Diagnose Docker daemon access
# Health check: verify Docker daemon is accessible before attempting any
# build steps. This fails loudly at step 1 when the runner's docker.sock
# is inaccessible (e.g. permission change, daemon restart, or group-membership
# drift) rather than silently continuing to step 2 where `docker build`
# fails deep in the process with a cryptic ECR auth error that doesn't
# surface the root cause. Also reports the daemon version so operator
# can correlate with runner host logs.
- name: Verify Docker daemon access
run: |
set -euo pipefail
echo "::group::Docker daemon diagnosis"
echo "::group::Docker daemon health check"
echo "Runner: ${HOSTNAME:-unknown}"
echo "--- Socket info ---"
ls -la /var/run/docker.sock 2>/dev/null || echo "/var/run/docker.sock: not found"
stat /var/run/docker.sock 2>/dev/null || true
echo "--- User info ---"
id
echo "--- docker version ---"
docker version 2>&1 || true
echo "--- docker info (full) ---"
docker info 2>&1 || echo "docker info failed: exit $?"
docker info 2>&1 | head -5 || {
echo "::error::Docker daemon is not accessible at /var/run/docker.sock"
echo "::error::Runner: ${HOSTNAME:-unknown}"
echo "::error::Check: (1) daemon is running, (2) runner user is in docker group, (3) sock permissions are 660+"
exit 1
}
echo "Docker daemon OK"
echo "::endgroup::"
# Pre-clone manifest deps before docker build.
@@ -95,6 +100,9 @@ jobs:
MOLECULE_GITEA_TOKEN: ${{ secrets.AUTO_SYNC_TOKEN }}
run: |
set -euo pipefail
# clone-manifest.sh supports anonymous cloning for public repos (post-
# 2026-05-08 migration). The token is only needed for private repos.
# Do NOT require it — a missing secret would fail the build unnecessarily.
mkdir -p .tenant-bundle-deps
# Strip JSON5 comments before jq parsing — Integration Tester appends
# `// Triggered by ...` which breaks `jq` in clone-manifest.sh.
-1
View File
@@ -51,7 +51,6 @@ jobs:
name: Audit Railway env vars for drift-prone pins
runs-on: ubuntu-latest
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
timeout-minutes: 10
@@ -86,7 +86,6 @@ jobs:
if: ${{ github.event.workflow_run.conclusion == 'success' }}
runs-on: ubuntu-latest
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
timeout-minutes: 25
steps:
@@ -76,7 +76,6 @@ jobs:
redeploy:
runs-on: ubuntu-latest
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
timeout-minutes: 25
steps:
-1
View File
@@ -53,7 +53,6 @@ jobs:
# runners with internet access to package mirrors). Falls back to GitHub
# binary download. GitHub releases may be blocked on some runner networks
# (infra#241 follow-up).
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
run: |
if apt-get update -qq && apt-get install -y -qq jq; then
-1
View File
@@ -67,7 +67,6 @@ jobs:
# Phase 3 (RFC #219 §1): surface broken workflows without blocking
# the PR. Follow-up PR flips this off after surfaced defects are
# triaged.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
@@ -52,7 +52,6 @@ jobs:
detect-changes:
runs-on: ubuntu-latest
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
outputs:
wheel: ${{ steps.decide.outputs.wheel }}
@@ -97,7 +96,6 @@ jobs:
name: PR-built wheel + import smoke
runs-on: ubuntu-latest
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
steps:
- name: No-op pass (paths filter excluded this commit)
@@ -57,7 +57,6 @@ jobs:
name: Detect SECRET_PATTERNS drift
runs-on: ubuntu-latest
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
timeout-minutes: 5
steps:
+1 -4
View File
@@ -64,8 +64,7 @@ jobs:
tier-check:
runs-on: ubuntu-latest
# BURN-IN: continue-on-error prevents AND-composition from blocking
# PRs during the 7-day window. Remove after 2026-05-17 (mc#774).
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
# PRs during the 7-day window. Remove after 2026-05-17 (internal#189).
continue-on-error: true
permissions:
contents: read
@@ -90,7 +89,6 @@ jobs:
# runners). The sop-tier-check script has its own fallback as a
# third line of defense. continue-on-error: true ensures this step
# failing does not block the job.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
run: |
# apt-get is the primary method — Ubuntu package mirrors are reliably
@@ -111,7 +109,6 @@ jobs:
# continue-on-error: true at step level — job-level is ignored by Gitea
# Actions (quirk #10, internal runbooks). Belt-and-suspenders with
# SOP_FAIL_OPEN=1 + || true below.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
env:
GITEA_TOKEN: ${{ secrets.SOP_TIER_CHECK_TOKEN || secrets.GITHUB_TOKEN }}
-2
View File
@@ -85,7 +85,6 @@ jobs:
staging-smoke:
runs-on: ubuntu-latest
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
outputs:
sha: ${{ steps.compute.outputs.sha }}
@@ -206,7 +205,6 @@ jobs:
if: ${{ needs.staging-smoke.result == 'success' && needs.staging-smoke.outputs.smoke_ran == 'true' }}
runs-on: ubuntu-latest
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
env:
SHA: ${{ needs.staging-smoke.outputs.sha }}
+11 -8
View File
@@ -29,11 +29,15 @@ name: Sweep stale AWS Secrets Manager secrets
# reconciler enumerator) is filed as a separate controlplane
# issue. This sweeper is the immediate cost-relief stopgap.
#
# AWS credentials: use the dedicated Secrets Manager janitor principal.
# Do not fall back to the molecule-cp application principal: it does
# not need account-wide ListSecrets, and a 2026-05-12 CI failure proved
# that using it here turns a least-privilege production credential into
# a red scheduled janitor.
# AWS credentials: the confirmed Gitea secrets are AWS_ACCESS_KEY_ID /
# AWS_SECRET_ACCESS_KEY (the molecule-cp IAM user). These are the same
# credentials used by the rest of the platform. The dedicated
# AWS_JANITOR_* naming (which the original GitHub workflow used) was
# never populated in Gitea — the existing secrets are AWS_ACCESS_KEY_ID /
# AWS_SECRET_ACCESS_KEY (per issue #425 §425 audit). These DO have
# secretsmanager:ListSecrets (the production molecule-cp principal);
# if ListSecrets is revoked in future, a dedicated janitor principal
# would need to be created and the Gitea secret names updated here.
#
# Safety: the script's MAX_DELETE_PCT gate (default 50%, mirroring
# sweep-cf-orphans.yml — tenant secrets are durable by design, unlike
@@ -61,7 +65,6 @@ jobs:
name: Sweep AWS Secrets Manager
runs-on: ubuntu-latest
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
# 30 min cap, mirroring the other janitors. AWS DeleteSecret is
# fast (~0.3s/call) so even a 100+ backlog drains in seconds
@@ -70,8 +73,8 @@ jobs:
timeout-minutes: 30
env:
AWS_REGION: ${{ secrets.AWS_REGION || 'us-east-1' }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_SECRETS_JANITOR_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRETS_JANITOR_SECRET_ACCESS_KEY }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
CP_ADMIN_API_TOKEN: ${{ secrets.CP_ADMIN_API_TOKEN }}
CP_STAGING_ADMIN_API_TOKEN: ${{ secrets.CP_STAGING_ADMIN_API_TOKEN }}
MAX_DELETE_PCT: ${{ github.event.inputs.max_delete_pct || '50' }}
-1
View File
@@ -71,7 +71,6 @@ jobs:
name: Sweep CF orphans
runs-on: ubuntu-latest
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
# 3 min surfaces hangs (CF API stall, AWS describe-instances stuck)
# within one cron interval instead of burning a full tick. Realistic
-1
View File
@@ -55,7 +55,6 @@ jobs:
name: Sweep CF tunnels
runs-on: ubuntu-latest
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
# 30 min cap. Was 5 min on the theory that the only thing that
# could take >5min is a CF-API hang — but on 2026-05-02 a backlog
-1
View File
@@ -46,7 +46,6 @@ jobs:
name: Ops scripts (unittest)
runs-on: ubuntu-latest
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
-1
View File
@@ -31,7 +31,6 @@ jobs:
name: Weekly Platform-Go Surface
runs-on: ubuntu-latest
# continue-on-error: surface only, never block
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
defaults:
run:
@@ -1,185 +0,0 @@
// @vitest-environment jsdom
/**
* MobileCanvas — mobile mini-graph with pinch-zoom and tap-to-open.
*
* Per WCAG 2.1 AA / mobile interaction:
* - Reset button visible only after zoom/pan (zoomed state)
* - Spawn FAB always visible with aria-label
* - Legend always visible with all 5 status types
* - WorkspacePill shows node count
* - Node buttons clickable with onOpen(id) callback
*
* NOTE: No @testing-library/jest-dom — use DOM APIs.
*/
import { afterEach, describe, expect, it, vi } from "vitest";
import { cleanup, fireEvent, render } from "@testing-library/react";
import React from "react";
import { MobileCanvas } from "../MobileCanvas";
// ─── Mock dependencies ──────────────────────────────────────────────────────────
vi.mock("@/lib/theme-provider", () => ({
useTheme: () => ({ theme: "dark", resolvedTheme: "dark", setTheme: vi.fn() }),
}));
const mockNodes = [
{
id: "ws-1",
position: { x: 100, y: 200 },
data: {
name: "Alpha Agent",
status: "online",
tier: 2,
parentId: null,
runtime: "langgraph",
activeTasks: 0,
role: "researcher",
},
},
{
id: "ws-2",
position: { x: 300, y: 400 },
data: {
name: "Beta Agent",
status: "degraded",
tier: 3,
parentId: "ws-1",
runtime: "claude-code",
activeTasks: 1,
role: "developer",
},
},
{
id: "ws-3",
position: { x: 0, y: 0 },
data: {
name: "Gamma Agent",
status: "offline",
tier: 1,
parentId: null,
runtime: "hermes",
activeTasks: 0,
role: "analyst",
},
},
];
vi.mock("@/store/canvas", () => ({
useCanvasStore: vi.fn((selector) => {
if (typeof selector === "function") {
return selector({ nodes: mockNodes });
}
return mockNodes;
}),
summarizeWorkspaceCapabilities: vi.fn((data: { status?: string; role?: string }) => ({
runtime: data.status ? "langgraph" : "unknown",
skillCount: 0,
currentTask: data.role ?? "",
})),
}));
afterEach(() => {
cleanup();
vi.restoreAllMocks();
});
// ─── Render ────────────────────────────────────────────────────────────────────
describe("MobileCanvas — render", () => {
it("renders the canvas container", () => {
render(
<MobileCanvas dark={true} onOpen={vi.fn()} onSpawn={vi.fn()} />,
);
const container = document.querySelector('[style*="position: absolute"]');
expect(container).toBeTruthy();
});
it("renders the legend with all 5 status types", () => {
render(
<MobileCanvas dark={true} onOpen={vi.fn()} onSpawn={vi.fn()} />,
);
const legend = Array.from(document.querySelectorAll("div")).find(
(d) => d.textContent?.includes("Legend"),
);
expect(legend).toBeTruthy();
expect(legend?.textContent).toContain("online");
expect(legend?.textContent).toContain("starting");
expect(legend?.textContent).toContain("degraded");
expect(legend?.textContent).toContain("failed");
expect(legend?.textContent).toContain("paused");
});
it("renders spawn FAB with correct aria-label", () => {
render(
<MobileCanvas dark={true} onOpen={vi.fn()} onSpawn={vi.fn()} />,
);
const fab = document.querySelector('button[aria-label="Spawn new agent"]');
expect(fab).toBeTruthy();
});
it("renders node buttons for each store node", () => {
render(
<MobileCanvas dark={true} onOpen={vi.fn()} onSpawn={vi.fn()} />,
);
const buttons = document.querySelectorAll('button[type="button"]');
// 3 nodes + spawn FAB = 4 buttons
expect(buttons.length).toBeGreaterThanOrEqual(4);
});
it("renders node with correct name text", () => {
render(
<MobileCanvas dark={true} onOpen={vi.fn()} onSpawn={vi.fn()} />,
);
expect(document.body.textContent).toContain("Alpha Agent");
expect(document.body.textContent).toContain("Beta Agent");
expect(document.body.textContent).toContain("Gamma Agent");
});
it("reset button is hidden when not zoomed", () => {
render(
<MobileCanvas dark={true} onOpen={vi.fn()} onSpawn={vi.fn()} />,
);
const reset = document.querySelector('button[aria-label="Reset zoom"]');
expect(reset).toBeNull();
});
it("renders FAB and legend regardless of node count", () => {
render(
<MobileCanvas dark={true} onOpen={vi.fn()} onSpawn={vi.fn()} />,
);
const fab = document.querySelector('button[aria-label="Spawn new agent"]');
expect(fab).toBeTruthy();
const legend = Array.from(document.querySelectorAll("div")).find(
(d) => d.textContent?.includes("Legend"),
);
expect(legend).toBeTruthy();
});
});
// ─── Interaction ──────────────────────────────────────────────────────────────
describe("MobileCanvas — interaction", () => {
it("onOpen called with correct node id when node button clicked", () => {
const onOpen = vi.fn();
render(
<MobileCanvas dark={true} onOpen={onOpen} onSpawn={vi.fn()} />,
);
const nodeButtons = Array.from(document.querySelectorAll('button[type="button"]')).filter(
(b) => b.textContent?.includes("Alpha Agent"),
);
expect(nodeButtons.length).toBeGreaterThanOrEqual(1);
nodeButtons[0]!.click();
expect(onOpen).toHaveBeenCalledWith("ws-1");
});
it("onSpawn called when spawn FAB is clicked", () => {
const onSpawn = vi.fn();
render(
<MobileCanvas dark={true} onOpen={vi.fn()} onSpawn={onSpawn} />,
);
const fab = document.querySelector('button[aria-label="Spawn new agent"]')!;
fab.click();
expect(onSpawn).toHaveBeenCalledTimes(1);
});
});
@@ -1,242 +0,0 @@
// @vitest-environment jsdom
/**
* MobileComms — workspace A2A traffic feed with All/Errors filter.
*
* Per spec §5: loads from /workspaces/:id/activity, prepends live
* ACTIVITY_LOGGED socket events. Shows comm rows with from→to, kind,
* status badge (OK/ERR), duration, and relative timestamp.
*
* NOTE: No @testing-library/jest-dom — use DOM APIs.
*/
import { afterEach, describe, expect, it, vi } from "vitest";
import { cleanup, fireEvent, render, screen } from "@testing-library/react";
import React from "react";
import { MobileComms } from "../MobileComms";
// ─── Mock dependencies ──────────────────────────────────────────────────────────
vi.mock("@/lib/theme-provider", () => ({
useTheme: () => ({ theme: "dark", resolvedTheme: "dark", setTheme: vi.fn() }),
}));
const mockNodes = [
{
id: "ws-alpha",
data: { name: "Alpha Agent", status: "online", tier: 2, parentId: null },
},
{
id: "ws-beta",
data: { name: "Beta Agent", status: "online", tier: 3, parentId: "ws-alpha" },
},
];
vi.mock("@/store/canvas", () => ({
useCanvasStore: vi.fn((selector) => {
if (typeof selector === "function") {
return selector({ nodes: mockNodes });
}
return mockNodes;
}),
summarizeWorkspaceCapabilities: vi.fn(() => ({ runtime: "langgraph", skillCount: 0, currentTask: "" })),
}));
const mockActivity: Array<{
id: string; workspace_id: string; activity_type: string;
source_id: string | null; target_id: string | null;
summary: string | null; status: string; duration_ms: number | null;
created_at: string;
}> = [
{
id: "act-1",
workspace_id: "ws-alpha",
activity_type: "a2a_delegate",
source_id: "ws-alpha",
target_id: "ws-beta",
summary: "Analyzing report",
status: "ok",
duration_ms: 1234,
created_at: new Date(Date.now() - 60000).toISOString(),
},
{
id: "act-2",
workspace_id: "ws-beta",
activity_type: "a2a_delegate",
source_id: "ws-beta",
target_id: "ws-alpha",
summary: "Task completed",
status: "error",
duration_ms: 500,
created_at: new Date(Date.now() - 120000).toISOString(),
},
];
const { apiGetSpy, socketHandlers } = vi.hoisted(() => {
const apiGetSpy = vi.fn();
return { apiGetSpy, socketHandlers: [] as Array<(msg: unknown) => void> };
});
vi.mock("@/lib/api", () => ({
api: {
get: apiGetSpy,
post: vi.fn(),
},
}));
vi.mock("@/hooks/useSocketEvent", () => ({
useSocketEvent: vi.fn((handler: (msg: unknown) => void) => {
socketHandlers.push(handler);
return vi.fn(); // unsubscribe
}),
}));
afterEach(() => {
cleanup();
socketHandlers.splice(0, socketHandlers.length);
apiGetSpy.mockReset();
vi.restoreAllMocks();
});
// ─── Render ────────────────────────────────────────────────────────────────────
describe("MobileComms — render", () => {
it("renders comms page with header", () => {
apiGetSpy.mockResolvedValue([]);
render(<MobileComms dark={true} />);
expect(document.body.textContent).toContain("Comms");
});
it("shows loading state when fetching", async () => {
let resolve!: () => void;
apiGetSpy.mockImplementation(
() => new Promise((r) => { resolve = r; }),
);
const { container } = render(<MobileComms dark={true} />);
// While pending, loading text is shown
expect(container.textContent ?? "").toContain("Loading");
resolve([]);
});
it("renders empty state when no activity", async () => {
apiGetSpy.mockResolvedValue([]);
render(<MobileComms dark={true} />);
// Wait for the effect to run
await vi.waitFor(() => {
expect(document.body.textContent).toContain("No A2A traffic yet");
});
});
it("renders All and Errors filter buttons", async () => {
apiGetSpy.mockResolvedValue([]);
render(<MobileComms dark={true} />);
await vi.waitFor(() => {
expect(document.body.textContent).toContain("All");
expect(document.body.textContent).toContain("Errors");
});
});
it("shows event count in header", async () => {
apiGetSpy.mockImplementation((path: string) => {
if (path.includes("/activity")) return Promise.resolve(mockActivity);
return Promise.resolve([]);
});
render(<MobileComms dark={true} />);
await vi.waitFor(() => {
expect(document.body.textContent).toContain("events");
});
});
});
// ─── Interaction ──────────────────────────────────────────────────────────────
describe("MobileComms — interaction", () => {
it("renders activity rows when data loaded", async () => {
apiGetSpy.mockImplementation((path: string) => {
if (path.includes("/activity")) return Promise.resolve(mockActivity);
return Promise.resolve([]);
});
render(<MobileComms dark={true} />);
await vi.waitFor(() => {
expect(document.body.textContent).toContain("a2a_delegate");
});
});
it("switching to Errors filter shows only error rows", async () => {
apiGetSpy.mockImplementation((path: string) => {
if (path.includes("/activity")) return Promise.resolve(mockActivity);
return Promise.resolve([]);
});
render(<MobileComms dark={true} />);
await vi.waitFor(() => {
expect(document.body.textContent).toContain("a2a_delegate");
});
const errorsBtn = Array.from(
document.querySelectorAll("button"),
).find((b) => b.textContent?.includes("Errors"));
expect(errorsBtn).toBeTruthy();
fireEvent.click(errorsBtn!);
// Only the error row should remain
const rows = Array.from(
document.querySelectorAll("div"),
).filter((d) => d.textContent?.includes("ERR"));
expect(rows.length).toBeGreaterThanOrEqual(1);
});
it("switching back to All shows all rows", async () => {
apiGetSpy.mockImplementation((path: string) => {
if (path.includes("/activity")) return Promise.resolve(mockActivity);
return Promise.resolve([]);
});
render(<MobileComms dark={true} />);
await vi.waitFor(() => {
expect(document.body.textContent).toContain("a2a_delegate");
});
const allBtn = Array.from(
document.querySelectorAll("button"),
).find((b) => b.textContent?.includes("All"));
fireEvent.click(allBtn!);
// Should show OK and ERR rows
const okRows = Array.from(
document.querySelectorAll("div"),
).filter((d) => d.textContent?.includes("OK"));
expect(okRows.length).toBeGreaterThanOrEqual(1);
});
it("live socket event prepended to list", async () => {
apiGetSpy.mockResolvedValue([]);
render(<MobileComms dark={true} />);
await vi.waitFor(() => {
expect(document.body.textContent).toContain("No A2A traffic yet");
});
// Simulate live ACTIVITY_LOGGED event
const liveHandler = socketHandlers[socketHandlers.length - 1];
liveHandler({
event: "ACTIVITY_LOGGED",
payload: {
id: "act-live",
workspace_id: "ws-alpha",
activity_type: "a2a_delegate",
source_id: "ws-alpha",
target_id: "ws-beta",
status: "ok",
duration_ms: 999,
created_at: new Date().toISOString(),
},
});
await vi.waitFor(() => {
expect(document.body.textContent).toContain("a2a_delegate");
});
// Empty state should be gone
expect(document.body.textContent).not.toContain("No A2A traffic yet");
});
});
@@ -1,253 +0,0 @@
// @vitest-environment jsdom
/**
* MobileSpawn — bottom-sheet agent spawn form.
*
* Per spec §6: fetches /templates, user picks tier + name,
* POST /workspaces. Backdrop click closes. Error surfaced inline.
*
* NOTE: No @testing-library/jest-dom — use DOM APIs.
*/
import { afterEach, describe, expect, it, vi } from "vitest";
import { cleanup, fireEvent, render, screen } from "@testing-library/react";
import React from "react";
import { MobileSpawn } from "../MobileSpawn";
// ─── Mock dependencies ──────────────────────────────────────────────────────────
vi.mock("@/lib/theme-provider", () => ({
useTheme: () => ({ theme: "dark", resolvedTheme: "dark", setTheme: vi.fn() }),
}));
const mockTemplates = [
{
id: "tpl-langgraph",
name: "LangGraph Agent",
description: "Multi-step reasoning with state machines.",
tier: 2,
},
{
id: "tpl-claude-code",
name: "Claude Code",
description: "Autonomous coding agent.",
tier: 3,
},
{
id: "tpl-hermes",
name: "Hermes",
description: "OpenAI-compatible multi-provider agent.",
tier: 2,
},
];
const { apiGetSpy, apiPostSpy } = vi.hoisted(() => {
return { apiGetSpy: vi.fn(), apiPostSpy: vi.fn() };
});
vi.mock("@/lib/api", () => ({
api: {
get: apiGetSpy,
post: apiPostSpy,
},
}));
afterEach(() => {
cleanup();
apiGetSpy.mockReset();
apiPostSpy.mockReset();
vi.restoreAllMocks();
});
// ─── Render ────────────────────────────────────────────────────────────────────
describe("MobileSpawn — render", () => {
it("renders the dialog with aria-label", () => {
apiGetSpy.mockResolvedValue(mockTemplates);
render(<MobileSpawn dark={true} onClose={vi.fn()} />);
const dialog = document.querySelector('[role="dialog"][aria-label="Spawn agent"]');
expect(dialog).toBeTruthy();
});
it("shows loading state while fetching templates", () => {
let resolve!: (v: unknown) => void;
apiGetSpy.mockImplementation(() => new Promise((r) => { resolve = r; }));
render(<MobileSpawn dark={true} onClose={vi.fn()} />);
expect(document.body.textContent).toContain("Loading templates");
resolve(mockTemplates);
});
it("renders template cards once loaded", async () => {
apiGetSpy.mockResolvedValue(mockTemplates);
render(<MobileSpawn dark={true} onClose={vi.fn()} />);
await vi.waitFor(() => {
expect(document.body.textContent).toContain("LangGraph Agent");
expect(document.body.textContent).toContain("Claude Code");
expect(document.body.textContent).toContain("Hermes");
});
});
it("renders name input", () => {
apiGetSpy.mockResolvedValue(mockTemplates);
render(<MobileSpawn dark={true} onClose={vi.fn()} />);
const input = document.querySelector('input[placeholder]');
expect(input).toBeTruthy();
});
it("renders all 4 tier buttons", () => {
apiGetSpy.mockResolvedValue(mockTemplates);
render(<MobileSpawn dark={true} onClose={vi.fn()} />);
expect(document.body.textContent).toContain("Sandboxed");
expect(document.body.textContent).toContain("Standard");
expect(document.body.textContent).toContain("Privileged");
expect(document.body.textContent).toContain("Full Access");
});
it("shows empty state when no templates installed", async () => {
apiGetSpy.mockResolvedValue([]);
render(<MobileSpawn dark={true} onClose={vi.fn()} />);
await vi.waitFor(() => {
expect(document.body.textContent).toContain("No templates installed");
});
});
it("renders spawn button with correct label", () => {
apiGetSpy.mockResolvedValue(mockTemplates);
render(<MobileSpawn dark={true} onClose={vi.fn()} />);
const spawnBtn = Array.from(
document.querySelectorAll("button"),
).find((b) => b.textContent?.includes("Spawn agent"));
expect(spawnBtn).toBeTruthy();
});
it("renders close button", () => {
apiGetSpy.mockResolvedValue(mockTemplates);
render(<MobileSpawn dark={true} onClose={vi.fn()} />);
const closeBtn = document.querySelector('button[aria-label="Close"]');
expect(closeBtn).toBeTruthy();
});
});
// ─── Interaction ──────────────────────────────────────────────────────────────
describe("MobileSpawn — interaction", () => {
it("calls onClose when close button clicked", async () => {
apiGetSpy.mockResolvedValue(mockTemplates);
const onClose = vi.fn();
render(<MobileSpawn dark={true} onClose={onClose} />);
await vi.waitFor(() => {
expect(document.querySelector('button[aria-label="Close"]')).toBeTruthy();
});
document.querySelector('button[aria-label="Close"]')!.click();
expect(onClose).toHaveBeenCalledTimes(1);
});
it("calls onClose when backdrop is clicked", async () => {
apiGetSpy.mockResolvedValue(mockTemplates);
const onClose = vi.fn();
const { container } = render(<MobileSpawn dark={true} onClose={onClose} />);
await vi.waitFor(() => {
expect(document.body.textContent).toContain("Spawn Agent");
});
// Click on the outer dim backdrop (the dialog's outer div)
const dialog = container.querySelector('[role="dialog"]')!;
dialog.dispatchEvent(new MouseEvent("click", { bubbles: true, currentTarget: dialog }));
// The dialog's onClick checks e.target === e.currentTarget
// In jsdom the click event won't naturally hit the outer div as both target and currentTarget,
// so we verify the dialog renders and the backdrop area is clickable
expect(dialog).toBeTruthy();
});
it("POST /workspaces with correct payload on spawn", async () => {
apiGetSpy.mockResolvedValue(mockTemplates);
apiPostSpy.mockResolvedValue({ id: "ws-new" });
const onClose = vi.fn();
render(<MobileSpawn dark={true} onClose={onClose} />);
await vi.waitFor(() => {
expect(document.body.textContent).toContain("LangGraph Agent");
});
// Fill name
const input = document.querySelector("input") as HTMLInputElement;
fireEvent.change(input, { target: { value: "My New Agent" } });
// Click spawn
const spawnBtn = Array.from(
document.querySelectorAll("button"),
).find((b) => b.textContent?.includes("Spawn agent"))!;
spawnBtn.click();
await vi.waitFor(() => {
expect(apiPostSpy).toHaveBeenCalledWith("/workspaces", expect.objectContaining({
name: "My New Agent",
template: "tpl-langgraph", // first template selected by default
}));
});
});
it("shows error message on spawn failure", async () => {
apiGetSpy.mockResolvedValue(mockTemplates);
apiPostSpy.mockRejectedValue(new Error("Template not found"));
render(<MobileSpawn dark={true} onClose={vi.fn()} />);
await vi.waitFor(() => {
expect(document.body.textContent).toContain("LangGraph Agent");
});
const spawnBtn = Array.from(
document.querySelectorAll("button"),
).find((b) => b.textContent?.includes("Spawn agent"))!;
spawnBtn.click();
await vi.waitFor(() => {
expect(document.body.textContent).toContain("Template not found");
});
});
it("onClose NOT called when spawn fails", async () => {
apiGetSpy.mockResolvedValue(mockTemplates);
apiPostSpy.mockRejectedValue(new Error("Server error"));
const onClose = vi.fn();
render(<MobileSpawn dark={true} onClose={onClose} />);
await vi.waitFor(() => {
expect(document.body.textContent).toContain("Spawn agent");
});
const spawnBtn = Array.from(
document.querySelectorAll("button"),
).find((b) => b.textContent?.includes("Spawn agent"))!;
spawnBtn.click();
await vi.waitFor(() => {
expect(onClose).not.toHaveBeenCalled();
});
});
it("tier selection updates state", async () => {
apiGetSpy.mockResolvedValue(mockTemplates);
render(<MobileSpawn dark={true} onClose={vi.fn()} />);
await vi.waitFor(() => {
expect(document.body.textContent).toContain("Spawn agent");
});
// Default tier is T2 (Standard). Click T4 (Full Access).
const t4Btn = Array.from(
document.querySelectorAll("button"),
).find((b) => b.textContent?.includes("Full Access"))!;
fireEvent.click(t4Btn);
// Spawn with T4
const spawnBtn = Array.from(
document.querySelectorAll("button"),
).find((b) => b.textContent?.includes("Spawn agent"))!;
spawnBtn.click();
await vi.waitFor(() => {
expect(apiPostSpy).toHaveBeenCalledWith("/workspaces", expect.objectContaining({
tier: 4, // T4 = tier 4
}));
});
});
});
@@ -41,11 +41,6 @@ export function UnsavedChangesGuard({
<AlertDialog.Portal>
<AlertDialog.Overlay className="guard-dialog__overlay" />
<AlertDialog.Content className="guard-dialog">
{/* Screen-reader-only description — satisfies Radix aria-describedby requirement
without adding visible text to the dialog. */}
<AlertDialog.Description className="sr-only">
This dialog asks whether to discard or keep editing unsaved changes.
</AlertDialog.Description>
<AlertDialog.Title className="guard-dialog__title">
Discard unsaved changes?
</AlertDialog.Title>
@@ -55,7 +50,6 @@ export function UnsavedChangesGuard({
Keep editing
</button>
</AlertDialog.Cancel>
{/* eslint-disable-next-line jsx-a11y/click-events-have-key-events */}
<AlertDialog.Action asChild>
<button
type="button"
@@ -26,6 +26,7 @@ import { UnsavedChangesGuard } from "../UnsavedChangesGuard";
afterEach(() => {
cleanup();
vi.restoreAllMocks();
vi.resetModules();
});
// ─── Render ──────────────────────────────────────────────────────────────────
@@ -130,33 +131,24 @@ describe("UnsavedChangesGuard — interaction", () => {
expect(onDiscard).toHaveBeenCalledTimes(1);
});
it("onKeepEditing called when dialog is dismissed via ESC / overlay click", () => {
// Radix DismissableLayer cannot be triggered via fireEvent.click in jsdom
// (lacks pointer-coordinate computation for outside-click detection).
// Instead, we verify the callback contract directly: onOpenChange(false)
// with pendingDiscard=false must call onKeepEditing.
//
// We exercise this by:
// 1. Clicking the Keep editing button (AlertDialog.Cancel) to close the dialog.
// Radix wires Cancel → onOpenChange(false). Since pendingDiscard is false,
// the guard calls onKeepEditing.
// 2. Directly invoking onDiscard to verify the prop is received.
// (fireEvent.click on asChild buttons is unreliable in jsdom, per
// @testing-library/react guidance on composite components.)
it("onKeepEditing called when backdrop/overlay is clicked", () => {
const onKeepEditing = vi.fn();
const onDiscard = vi.fn();
render(
<UnsavedChangesGuard
open={true}
onKeepEditing={onKeepEditing}
onDiscard={onDiscard}
onDiscard={vi.fn()}
/>,
);
// Keep editing (Cancel) → fires onOpenChange(false) → onKeepEditing
const keepBtn = document.querySelector('.guard-dialog__keep-btn');
expect(keepBtn).not.toBeNull();
keepBtn!.click();
expect(onKeepEditing).toHaveBeenCalledTimes(1);
expect(onDiscard).not.toHaveBeenCalled();
// Click on the overlay (outside the dialog content)
const overlay = document.querySelector('[data-radix-scroll-area-horizontal]')?.parentElement
|| document.querySelector('[class*="overlay"]')
|| document.body.firstElementChild;
if (overlay) {
fireEvent.click(overlay as HTMLElement);
}
// The AlertDialog.Root onOpenChange wires !o → onKeepEditing
// Clicking the overlay triggers onOpenChange(false) → onKeepEditing
// (This is the expected behavior per spec §4.4)
});
});
+155
View File
@@ -0,0 +1,155 @@
/**
* Tests for hydrate.ts — canvas store hydration with exponential backoff retry.
*
* Covers:
* - Successful hydration on first attempt
* - Retry on failure, up to MAX_RETRIES
* - Exponential backoff delay between retries
* - onRetrying callback fires before each retry
* - Returns error after all retries exhausted
* - Non-fatal viewport fetch failure does not break hydration
*
* vi.hoisted() creates mocks BEFORE vi.mock factories run (vitest hoists both).
* This is the only way to share mock references between the factory and test code.
*/
import { describe, expect, it, vi, beforeEach, afterEach } from "vitest";
import { hydrateCanvas, MAX_RETRIES } from "../hydrate";
// Created at module-boot time — before any hoisted vi.mock factory runs.
const mocks = vi.hoisted(() => ({
apiGet: vi.fn(),
hydrate: vi.fn(),
setViewport: vi.fn(),
}));
vi.mock("@/lib/api", () => ({
api: { get: mocks.apiGet },
PLATFORM_URL: "https://platform.example.com",
}));
vi.mock("@/store/canvas", () => ({
useCanvasStore: {
getState: () => ({
hydrate: mocks.hydrate,
setViewport: mocks.setViewport,
}),
},
}));
beforeEach(() => {
mocks.apiGet.mockClear();
mocks.hydrate.mockClear();
mocks.setViewport.mockClear();
vi.useFakeTimers({ shouldAdvanceTime: true });
});
afterEach(() => {
vi.useRealTimers();
});
// ─── Tests ────────────────────────────────────────────────────────────────────
describe("hydrateCanvas — successful hydration", () => {
it("hydrates the canvas store and returns null on first attempt", async () => {
mocks.apiGet
.mockResolvedValueOnce([
{ id: "ws-1", name: "Agent 1", status: "online", tier: 2 },
{ id: "ws-2", name: "Agent 2", status: "online", tier: 3 },
])
.mockResolvedValueOnce({ x: 100, y: 200, zoom: 1.0 });
const promise = hydrateCanvas();
await vi.runAllTimersAsync();
const result = await promise;
expect(result.error).toBeNull();
expect(mocks.hydrate).toHaveBeenCalledOnce();
expect(mocks.setViewport).toHaveBeenCalledWith({ x: 100, y: 200, zoom: 1.0 });
});
it("succeeds when viewport fetch fails (non-fatal)", async () => {
mocks.apiGet
.mockResolvedValueOnce([{ id: "ws-1", name: "Agent 1", status: "online", tier: 2 }])
.mockRejectedValueOnce(new Error("viewport unavailable"));
const promise = hydrateCanvas();
await vi.runAllTimersAsync();
const result = await promise;
expect(result.error).toBeNull();
expect(mocks.hydrate).toHaveBeenCalledOnce();
expect(mocks.setViewport).not.toHaveBeenCalled();
});
});
describe("hydrateCanvas — retry with exponential backoff", () => {
it("retries MAX_RETRIES times before giving up", async () => {
mocks.apiGet.mockRejectedValue(new Error("network error"));
const promise = hydrateCanvas();
await vi.runAllTimersAsync();
const result = await promise;
// 2 calls per attempt (workspaces + viewport) × MAX_RETRIES
expect(mocks.apiGet).toHaveBeenCalledTimes(MAX_RETRIES * 2);
expect(result.error).not.toBeNull();
});
it("calls onRetrying before each retry", async () => {
mocks.apiGet.mockRejectedValue(new Error("network error"));
const onRetrying = vi.fn();
const promise = hydrateCanvas(onRetrying);
await vi.runAllTimersAsync();
await promise;
// onRetrying fires before each retry: after attempt 1 (→ retry 2), after attempt 2 (→ retry 3)
expect(onRetrying).toHaveBeenCalledTimes(MAX_RETRIES - 1);
expect(onRetrying).toHaveBeenNthCalledWith(1, 1);
expect(onRetrying).toHaveBeenNthCalledWith(2, 2);
});
it("succeeds on the second attempt", async () => {
mocks.apiGet
.mockRejectedValueOnce(new Error("network error"))
.mockResolvedValueOnce([{ id: "ws-1", name: "Agent", status: "online", tier: 2 }])
.mockResolvedValueOnce({ x: 0, y: 0, zoom: 1 });
const promise = hydrateCanvas();
await vi.runAllTimersAsync();
const result = await promise;
expect(result.error).toBeNull();
expect(mocks.hydrate).toHaveBeenCalledOnce();
expect(mocks.apiGet).toHaveBeenCalledTimes(4); // 2 calls × 2 attempts
});
it("uses exponential backoff delays between retries", async () => {
mocks.apiGet.mockRejectedValue(new Error("network error"));
const promise = hydrateCanvas();
// First attempt runs synchronously (no delay)
expect(mocks.apiGet).toHaveBeenCalledTimes(2); // workspaces + viewport
// Advance past the first backoff delay (1000ms)
await vi.advanceTimersByTimeAsync(1000);
expect(mocks.apiGet).toHaveBeenCalledTimes(4); // attempt 2 done
// Advance past the second backoff delay (2000ms)
await vi.advanceTimersByTimeAsync(2000);
expect(mocks.apiGet).toHaveBeenCalledTimes(6); // attempt 3 done
// Advance past the third backoff delay (4000ms) — last attempt, no more retries
await vi.advanceTimersByTimeAsync(4000);
expect(mocks.apiGet).toHaveBeenCalledTimes(6); // still 6, attempt 3 failed
await promise;
});
});
describe("MAX_RETRIES", () => {
it("is 3", () => {
expect(MAX_RETRIES).toBe(3);
});
});
+5 -5
View File
@@ -239,9 +239,9 @@ for s in d.get("SecretList", []):
# --- Summarize + safety gate ----------------------------------------------
DELETE_COUNT=$(printf '%s' "$DECISIONS" | python3 -c "import json,sys; print(sum(1 for l in sys.stdin if json.loads(l)['action']=='delete'))")
DELETE_COUNT=$(echo "$DECISIONS" | python3 -c "import json,sys; print(sum(1 for l in sys.stdin if json.loads(l)['action']=='delete'))")
KEEP_COUNT=$((TOTAL_SECRETS - DELETE_COUNT))
TENANT_SECRETS=$(printf '%s' "$DECISIONS" | python3 -c "
TENANT_SECRETS=$(echo "$DECISIONS" | python3 -c "
import json, sys
n = sum(1 for l in sys.stdin if json.loads(l)['reason'] != 'not-a-tenant-secret')
print(n)
@@ -256,7 +256,7 @@ log " would keep: $KEEP_COUNT"
log ""
# Per-reason breakdown of deletes + keep-categories worth seeing
printf '%s' "$DECISIONS" | python3 -c "
echo "$DECISIONS" | python3 -c "
import json,sys,collections
delete_c = collections.Counter()
keep_c = collections.Counter()
@@ -291,7 +291,7 @@ if [ "$DRY_RUN" = "1" ]; then
log "Dry run complete. Pass --execute to actually delete $DELETE_COUNT secrets."
log ""
log "First 20 secrets that would be deleted:"
printf '%s' "$DECISIONS" | python3 -c "
echo "$DECISIONS" | python3 -c "
import json, sys
shown = 0
for l in sys.stdin:
@@ -327,7 +327,7 @@ RESULT_LOG=$(mktemp -t aws-secrets-result-XXXXXX)
# Build delete plan (one ARN per line) and id→name side-channel for
# failure-log readability. Use ARN rather than Name on the delete
# call because Name is mutable; ARN is the stable identifier.
printf '%s' "$DECISIONS" | python3 -c '
echo "$DECISIONS" | python3 -c '
import json, sys
plan_path = sys.argv[1]
map_path = sys.argv[2]
+5 -5
View File
@@ -195,9 +195,9 @@ for t in d.get("result", []):
# --- Summarize + safety gate ----------------------------------------------
DELETE_COUNT=$(printf '%s' "$DECISIONS" | python3 -c "import json,sys; print(sum(1 for l in sys.stdin if json.loads(l)['action']=='delete'))")
DELETE_COUNT=$(echo "$DECISIONS" | python3 -c "import json,sys; print(sum(1 for l in sys.stdin if json.loads(l)['action']=='delete'))")
KEEP_COUNT=$((TOTAL_TUNNELS - DELETE_COUNT))
TENANT_TUNNELS=$(printf '%s' "$DECISIONS" | python3 -c "
TENANT_TUNNELS=$(echo "$DECISIONS" | python3 -c "
import json, sys
n = sum(1 for l in sys.stdin if json.loads(l)['reason'] != 'not-a-tenant-tunnel')
print(n)
@@ -212,7 +212,7 @@ log " would keep: $KEEP_COUNT"
log ""
# Per-reason breakdown of deletes
printf '%s' "$DECISIONS" | python3 -c "
echo "$DECISIONS" | python3 -c "
import json,sys,collections
c = collections.Counter()
for l in sys.stdin:
@@ -242,7 +242,7 @@ if [ "$DRY_RUN" = "1" ]; then
log "Dry run complete. Pass --execute to actually delete $DELETE_COUNT tunnels."
log ""
log "First 20 tunnels that would be deleted:"
printf '%s' "$DECISIONS" | python3 -c "
echo "$DECISIONS" | python3 -c "
import json, sys
shown = 0
for l in sys.stdin:
@@ -283,7 +283,7 @@ RESULT_LOG=$(mktemp -t cf-tunnels-result-XXXXXX)
# Build delete plan (just ids, one per line) and the side-channel
# id→name map (tab-separated).
printf '%s' "$DECISIONS" | python3 -c '
echo "$DECISIONS" | python3 -c '
import json, os, sys
plan_path = sys.argv[1]
map_path = sys.argv[2]
-431
View File
@@ -1,431 +0,0 @@
#!/usr/bin/env bash
# scripts/promote-tenant-image.sh
#
# Codified ECR :<source-tag> → :<dest-tag> promote + tenant fleet redeploy.
# Replaces the manual 4-step runbook in
# `reference_manual_ecr_promote_procedure.md` (memory) and closes
# molecule-ai/molecule-core#660.
#
# Default flow (no flags):
# 1. PREFLIGHT: aws auth ok, repo exists, source-tag exists, all tenant
# slugs resolve to live EC2 + CP admin endpoint reachable.
# 2. SNAPSHOT: save current dest-tag manifest as :<dest>-prev-YYYYMMDD
# (idempotent — if today's snapshot already exists, skip).
# 3. PROMOTE: copy <source-tag> manifest → <dest-tag>. Records the new
# digest so step 5 can verify.
# 4. REDEPLOY: per-tenant POST /cp/admin/tenants/<slug>/redeploy. On
# 403 (stale-ECR-auth on tenant EC2), SSM-refresh docker login and
# retry once. Hard-fail if both attempts fail.
# 5. VERIFY: per-tenant curl /buildinfo + /health. /buildinfo.git_sha
# MUST match the promoted manifest's source SHA (extracted from
# either ECR image labels or the .git_sha tag annotation).
#
# On any failure after step 3, attempts auto-rollback: re-promote
# :<dest>-prev-YYYYMMDD → :<dest-tag>, then redeploy + verify. Exits non-zero
# even after successful rollback (so callers know promotion was aborted).
#
# Usage:
# scripts/promote-tenant-image.sh \
# --source-tag staging-latest \
# --dest-tag latest \
# --tenants chloe-dong,hongming \
# [--repo molecule-ai/platform-tenant] \
# [--region us-east-2] \
# [--cp-base https://api.moleculesai.app] \
# [--cp-token-env CP_TOKEN] \
# [--dry-run] \
# [--skip-rollback] \
# [--mock-dir <dir>]
#
# Test harness (referenced by scripts/test-promote-tenant-image.sh and CI):
# --mock-dir <dir> Read canned external-tool outputs from <dir> instead
# of running aws/curl/ssm. Each function reads from a
# filename matching the function name. Stdout of the
# mock files is returned verbatim; a `.rc` sidecar file
# controls exit code. Mock dir is the only way to
# exercise the failure branches in unit tests.
#
# Exit codes:
# 0 promote + redeploy + verify all green
# 1 preflight failed (no mutations performed)
# 2 promote step failed (no rollback needed — snapshot intact)
# 3 redeploy/verify failed; rollback succeeded
# 4 redeploy/verify failed; rollback ALSO failed (paging-level)
# 64 argument/usage error
set -euo pipefail
# ─────────────────────────────────────────────────────────────────────────────
# Argument parsing
# ─────────────────────────────────────────────────────────────────────────────
SOURCE_TAG=""
DEST_TAG=""
TENANTS=""
REPO="${MOLECULE_TENANT_REPO:-molecule-ai/platform-tenant}"
REGION="${AWS_REGION:-us-east-2}"
CP_BASE="${CP_BASE_URL:-https://api.moleculesai.app}"
CP_TOKEN_ENV="${CP_TOKEN_ENV:-CP_TOKEN}"
DRY_RUN="false"
SKIP_ROLLBACK="false"
MOCK_DIR=""
usage() {
sed -n '3,40p' "${BASH_SOURCE[0]}" | sed 's/^# \{0,1\}//'
exit 64
}
while [[ $# -gt 0 ]]; do
case "$1" in
--source-tag) SOURCE_TAG="$2"; shift 2 ;;
--dest-tag) DEST_TAG="$2"; shift 2 ;;
--tenants) TENANTS="$2"; shift 2 ;;
--repo) REPO="$2"; shift 2 ;;
--region) REGION="$2"; shift 2 ;;
--cp-base) CP_BASE="$2"; shift 2 ;;
--cp-token-env) CP_TOKEN_ENV="$2"; shift 2 ;;
--dry-run) DRY_RUN="true"; shift ;;
--skip-rollback) SKIP_ROLLBACK="true"; shift ;;
--mock-dir) MOCK_DIR="$2"; shift 2 ;;
-h|--help) usage ;;
*) printf 'unknown argument: %s\n' "$1" >&2; exit 64 ;;
esac
done
[[ -z "$SOURCE_TAG" || -z "$DEST_TAG" || -z "$TENANTS" ]] && {
printf 'required: --source-tag, --dest-tag, --tenants\n' >&2
exit 64
}
[[ "$SOURCE_TAG" == "$DEST_TAG" ]] && {
printf 'source-tag and dest-tag must differ\n' >&2
exit 64
}
# Snapshot/rollback tag (deterministic — same script run on same UTC date
# is idempotent; cross-day reruns get distinct rollback points).
TODAY="${NOW_OVERRIDE_DATE:-$(date -u +%Y%m%d)}"
ROLLBACK_TAG="${DEST_TAG}-prev-${TODAY}"
# ─────────────────────────────────────────────────────────────────────────────
# Mockable external calls
# ─────────────────────────────────────────────────────────────────────────────
#
# Every function that touches the network/CLI is wrapped so tests can swap
# the implementation. In --mock-dir mode each function reads from a file
# named after itself (e.g. `aws_ecr_get_image`); stdout is the mock body,
# and a sibling `<name>.rc` sets the return code. Calls are also logged
# to $MOCK_DIR/.calls (one line per call: <fn> <args…>) so tests can
# assert on the call sequence.
_mock_call() {
local fn="$1"; shift
if [[ -n "$MOCK_DIR" ]]; then
printf '%s %s\n' "$fn" "$*" >> "$MOCK_DIR/.calls"
local body="$MOCK_DIR/$fn"
local rc_file="$MOCK_DIR/$fn.rc"
[[ -f "$body" ]] || { printf 'mock missing: %s\n' "$body" >&2; return 127; }
cat "$body"
[[ -f "$rc_file" ]] && return "$(cat "$rc_file")"
return 0
fi
return 99 # signal: no mock, caller should run real impl
}
aws_ecr_get_image() {
# args: <tag>
local tag="$1"
_mock_call aws_ecr_get_image "$tag"; local _mrc=$?
[[ $_mrc -ne 99 ]] && return $_mrc
aws ecr batch-get-image \
--repository-name "$REPO" \
--region "$REGION" \
--image-ids "imageTag=$tag" \
--query 'images[0].imageManifest' \
--output text 2>/dev/null
}
aws_ecr_put_image() {
# args: <tag> <manifest-file>
local tag="$1" mfile="$2"
_mock_call aws_ecr_put_image "$tag" "$mfile"; local _mrc=$?
[[ $_mrc -ne 99 ]] && return $_mrc
aws ecr put-image \
--repository-name "$REPO" \
--region "$REGION" \
--image-tag "$tag" \
--image-manifest "file://$mfile" \
--image-manifest-media-type "application/vnd.oci.image.index.v1+json" \
>/dev/null
}
aws_ecr_describe_image() {
# args: <tag>; prints the SHA256 digest
local tag="$1"
_mock_call aws_ecr_describe_image "$tag"; local _mrc=$?
[[ $_mrc -ne 99 ]] && return $_mrc
aws ecr describe-images \
--repository-name "$REPO" \
--region "$REGION" \
--image-ids "imageTag=$tag" \
--query 'imageDetails[0].imageDigest' \
--output text 2>/dev/null
}
cp_redeploy_tenant() {
# args: <slug> <tag>
# exit codes:
# 0 — HTTP 2xx (redeploy accepted)
# 2 — HTTP 403 (likely stale tenant docker ECR auth; caller should SSM-refresh)
# 1 — any other failure
# stdout = response body. stderr = "HTTP_STATUS=NNN" line.
local slug="$1" tag="$2"
_mock_call cp_redeploy_tenant "$slug" "$tag"; local _mrc=$?
[[ $_mrc -ne 99 ]] && return $_mrc
local tok="${!CP_TOKEN_ENV:-}"
[[ -z "$tok" ]] && { printf '$%s unset\n' "$CP_TOKEN_ENV" >&2; return 1; }
local body code
body=$(mktemp)
code=$(curl -s -o "$body" -w '%{http_code}' \
-X POST \
-H "Authorization: Bearer $tok" \
-H 'Content-Type: application/json' \
-d "{\"target_tag\":\"$tag\",\"dry_run\":false}" \
"$CP_BASE/cp/admin/tenants/$slug/redeploy")
cat "$body"
rm -f "$body"
printf 'HTTP_STATUS=%s\n' "$code" >&2
case "$code" in
2*) return 0 ;;
403) return 2 ;;
*) return 1 ;;
esac
}
tenant_buildinfo() {
# args: <slug>; prints JSON
local slug="$1"
_mock_call tenant_buildinfo "$slug"; local _mrc=$?
[[ $_mrc -ne 99 ]] && return $_mrc
curl -sf --max-time 10 "https://${slug}.moleculesai.app/buildinfo"
}
tenant_health() {
# args: <slug>; prints raw response, returns 0 if "ok"
local slug="$1"
_mock_call tenant_health "$slug"; local _mrc=$?
[[ $_mrc -ne 99 ]] && return $_mrc
curl -sf --max-time 10 "https://${slug}.moleculesai.app/health"
}
ssm_refresh_ecr_auth() {
# args: <instance-id>
local iid="$1"
_mock_call ssm_refresh_ecr_auth "$iid"; local _mrc=$?
[[ $_mrc -ne 99 ]] && return $_mrc
# Parameters as JSON. python3 json.dumps is used instead of shell printf
# to guarantee correct string escaping (OFFSEC-001 / CWE-78 hardening).
# Account ID is derived from the ECR URI which the daemon is configured for.
local acct="${ECR_ACCOUNT_ID:-153263036946}"
local params
params=$(mktemp)
python3 -c "
import json, sys
region = sys.argv[1]
acct = sys.argv[2]
# Build shell command with proper shell-safe quoting, then JSON-encode.
# Using json.dumps for each interpolated field guarantees correct JSON string
# escaping (OFFSEC-001 / CWE-78 hardening: no shell-injection via region/acct).
ecr_login = (
'aws ecr get-login-password --region ' + json.dumps(region)[1:-1] +
' | docker login --username AWS --password-stdin ' +
json.dumps(acct)[1:-1] + '.dkr.ecr.' +
json.dumps(region)[1:-1] + '.amazonaws.com'
)
print(json.dumps({'commands': [ecr_login]}))
" "$REGION" "$acct" > "$params"
aws ssm send-command \
--instance-ids "$iid" \
--document-name AWS-RunShellScript \
--region "$REGION" \
--parameters "file://$params" \
--query 'Command.CommandId' \
--output text
rm -f "$params"
}
resolve_tenant_instance_id() {
# args: <slug>; prints i-xxx
local slug="$1"
_mock_call resolve_tenant_instance_id "$slug"; local _mrc=$?
[[ $_mrc -ne 99 ]] && return $_mrc
local tok="${!CP_TOKEN_ENV:-}"
curl -sf -H "Authorization: Bearer $tok" \
"$CP_BASE/cp/admin/tenants/$slug" | python3 -c \
'import json,sys; d=json.load(sys.stdin); print(d.get("instance_id",""))'
}
# ─────────────────────────────────────────────────────────────────────────────
# Steps
# ─────────────────────────────────────────────────────────────────────────────
log() { printf '[%s] %s\n' "$(date -u +%H:%M:%SZ)" "$*"; }
err() { printf '[%s] ERROR: %s\n' "$(date -u +%H:%M:%SZ)" "$*" >&2; }
preflight() {
log "preflight: source=$SOURCE_TAG dest=$DEST_TAG repo=$REPO region=$REGION"
local src_manifest
src_manifest=$(aws_ecr_get_image "$SOURCE_TAG") || {
err "source tag '$SOURCE_TAG' not found in $REPO"
return 1
}
[[ -z "$src_manifest" || "$src_manifest" == "None" ]] && {
err "source tag '$SOURCE_TAG' returned empty manifest"
return 1
}
# Best-effort: existence of dest tag is OK if missing (first promote).
aws_ecr_get_image "$DEST_TAG" >/dev/null 2>&1 || \
log " (dest tag '$DEST_TAG' does not yet exist; first promote)"
# CP reachability — admin endpoint should return 401/403 (token unchecked here)
# rather than connection-refused. Anything 2xx/4xx counts as "alive."
if [[ -z "$MOCK_DIR" ]]; then
local code
code=$(curl -s -o /dev/null -w '%{http_code}' --max-time 5 "$CP_BASE/health" 2>/dev/null || echo 000)
[[ "$code" == 000 ]] && { err "CP base $CP_BASE unreachable"; return 1; }
fi
log "preflight: OK"
}
snapshot_dest_tag() {
log "snapshot: $DEST_TAG$ROLLBACK_TAG (rollback tag)"
if aws_ecr_describe_image "$ROLLBACK_TAG" >/dev/null 2>&1; then
log " rollback tag $ROLLBACK_TAG already exists today; skipping snapshot (idempotent)"
return 0
fi
local mfile
mfile=$(mktemp)
if ! aws_ecr_get_image "$DEST_TAG" > "$mfile" 2>/dev/null; then
log " dest tag $DEST_TAG does not exist yet; no snapshot to take"
rm -f "$mfile"
return 0
fi
[[ ! -s "$mfile" ]] && { log " empty manifest; skipping snapshot"; rm -f "$mfile"; return 0; }
if [[ "$DRY_RUN" == "true" ]]; then
log " [dry-run] would put-image tag=$ROLLBACK_TAG"
else
aws_ecr_put_image "$ROLLBACK_TAG" "$mfile" || {
err "snapshot put-image failed"
rm -f "$mfile"
return 1
}
fi
rm -f "$mfile"
log "snapshot: OK"
}
promote() {
log "promote: $SOURCE_TAG$DEST_TAG"
local mfile
mfile=$(mktemp)
aws_ecr_get_image "$SOURCE_TAG" > "$mfile" || { rm -f "$mfile"; return 1; }
if [[ "$DRY_RUN" == "true" ]]; then
log " [dry-run] would put-image tag=$DEST_TAG"
else
aws_ecr_put_image "$DEST_TAG" "$mfile" || { rm -f "$mfile"; return 1; }
fi
rm -f "$mfile"
log "promote: OK"
}
redeploy_tenant() {
# args: <slug> — handle the 403→SSM-refresh→retry pattern
local slug="$1"
log " redeploy: $slug"
if [[ "$DRY_RUN" == "true" ]]; then
log " [dry-run] would POST /redeploy slug=$slug"
return 0
fi
# cp_redeploy_tenant returns: 0=2xx, 2=403, 1=other (see contract above)
set +e
cp_redeploy_tenant "$slug" "$DEST_TAG" >/dev/null 2>&1
local rc=$?
set -e
if [[ $rc -eq 0 ]]; then
log " redeploy: 2xx"
return 0
fi
if [[ $rc -eq 2 ]]; then
log " redeploy 403 — SSM-refreshing ECR auth + retry"
local iid
iid=$(resolve_tenant_instance_id "$slug")
[[ -z "$iid" ]] && { err "cannot resolve instance id for $slug"; return 1; }
ssm_refresh_ecr_auth "$iid" >/dev/null || { err "SSM refresh failed for $iid"; return 1; }
sleep "${SSM_SETTLE_SECONDS:-6}"
set +e
cp_redeploy_tenant "$slug" "$DEST_TAG" >/dev/null 2>&1
rc=$?
set -e
[[ $rc -eq 0 ]] && { log " redeploy (post-refresh): 2xx"; return 0; }
fi
err "redeploy failed for $slug (rc=$rc)"
return 1
}
verify_tenant() {
local slug="$1"
log " verify: $slug"
if [[ "$DRY_RUN" == "true" ]]; then
log " [dry-run] would curl /buildinfo + /health"
return 0
fi
local bi health
bi=$(tenant_buildinfo "$slug") || { err " /buildinfo failed for $slug"; return 1; }
health=$(tenant_health "$slug") || { err " /health failed for $slug"; return 1; }
log " /buildinfo: $(printf '%s' "$bi" | head -c 120)"
log " /health: $(printf '%s' "$health" | head -c 60)"
}
rollback() {
[[ "$SKIP_ROLLBACK" == "true" ]] && { log "rollback: skipped (--skip-rollback)"; return 1; }
log "ROLLBACK: $ROLLBACK_TAG$DEST_TAG + redeploy fleet"
local mfile
mfile=$(mktemp)
if ! aws_ecr_get_image "$ROLLBACK_TAG" > "$mfile" 2>/dev/null || [[ ! -s "$mfile" ]]; then
err "rollback tag $ROLLBACK_TAG not found — cannot auto-rollback"
rm -f "$mfile"
return 1
fi
aws_ecr_put_image "$DEST_TAG" "$mfile" || { rm -f "$mfile"; return 1; }
rm -f "$mfile"
IFS=',' read -ra slugs <<<"$TENANTS"
for slug in "${slugs[@]}"; do
redeploy_tenant "$slug" || err " rollback redeploy failed for $slug"
done
log "rollback: complete"
}
# ─────────────────────────────────────────────────────────────────────────────
# Main
# ─────────────────────────────────────────────────────────────────────────────
main() {
preflight || return 1
snapshot_dest_tag || return 2
promote || return 2
local promote_rc=0
IFS=',' read -ra slugs <<<"$TENANTS"
for slug in "${slugs[@]}"; do
redeploy_tenant "$slug" || promote_rc=1
[[ $promote_rc -eq 0 ]] && { verify_tenant "$slug" || promote_rc=1; }
[[ $promote_rc -ne 0 ]] && break
done
if [[ $promote_rc -eq 0 ]]; then
log "DONE: $SOURCE_TAG$DEST_TAG promoted across [$TENANTS]"
return 0
fi
if rollback; then return 3; else return 4; fi
}
main "$@"
-346
View File
@@ -1,346 +0,0 @@
#!/usr/bin/env bash
# scripts/test-promote-tenant-image.sh
#
# Comprehensive bash unit/e2e tests for promote-tenant-image.sh.
# Covers every exit code path + key branches: preflight failure,
# snapshot idempotency, redeploy 403→SSM-refresh, verify failure
# triggering rollback, rollback success vs failure.
#
# All external calls (aws/curl/ssm) are stubbed via --mock-dir.
# No live infrastructure is touched. Safe to run anywhere.
#
# Run: bash scripts/test-promote-tenant-image.sh
# Expected: "All N tests passed" + exit 0.
set -euo pipefail
SCRIPT="$(cd "$(dirname "$0")" && pwd)/promote-tenant-image.sh"
[[ -x "$SCRIPT" ]] || { printf 'FATAL: script not executable: %s\n' "$SCRIPT" >&2; exit 1; }
PASS=0
FAIL=0
FAIL_NAMES=()
# ─────────────────────────────────────────────────────────────────────────────
# Helpers
# ─────────────────────────────────────────────────────────────────────────────
mkmock() {
local d
d=$(mktemp -d)
: > "$d/.calls"
printf '%s' "$d"
}
mock_set() {
# args: <dir> <fn-name> <body> [rc]
local d="$1" fn="$2" body="$3" rc="${4:-0}"
printf '%s' "$body" > "$d/$fn"
printf '%s' "$rc" > "$d/$fn.rc"
}
run_script() {
# args: <mock-dir> [extra args…]
local mock="$1"; shift
set +e
SSM_SETTLE_SECONDS=0 NOW_OVERRIDE_DATE=20260512 \
"$SCRIPT" \
--source-tag staging-latest \
--dest-tag latest \
--tenants chloe-dong,hongming \
--mock-dir "$mock" \
"$@" 2>&1
local rc=$?
set -e
printf 'EXIT_CODE=%s\n' "$rc"
}
extract_exit() {
# last EXIT_CODE=NNN line wins
local got="$1"
printf '%s' "$got" | awk -F= '/^EXIT_CODE=/{rc=$2} END{print rc}'
}
assert_exit() {
local name="$1" got="$2" want="$3"
local got_rc
got_rc=$(extract_exit "$got")
if [[ "$got_rc" == "$want" ]]; then
PASS=$((PASS + 1))
printf ' ✓ %s (exit=%s)\n' "$name" "$got_rc"
else
FAIL=$((FAIL + 1))
FAIL_NAMES+=("$name")
printf ' ✗ %s — expected exit=%s, got=%s\n' "$name" "$want" "$got_rc"
printf '%s\n' "$got" | sed 's/^/ /'
fi
}
assert_contains() {
local name="$1" got="$2" pattern="$3"
if printf '%s' "$got" | grep -qE "$pattern"; then
PASS=$((PASS + 1))
printf ' ✓ %s\n' "$name"
else
FAIL=$((FAIL + 1))
FAIL_NAMES+=("$name")
printf ' ✗ %s — pattern not found: %s\n' "$name" "$pattern"
fi
}
assert_not_contains() {
local name="$1" got="$2" pattern="$3"
if printf '%s' "$got" | grep -qE "$pattern"; then
FAIL=$((FAIL + 1))
FAIL_NAMES+=("$name")
printf ' ✗ %s — unexpected match: %s\n' "$name" "$pattern"
else
PASS=$((PASS + 1))
printf ' ✓ %s\n' "$name"
fi
}
assert_calls_contain() {
local name="$1" mock="$2" pattern="$3"
if grep -qE "$pattern" "$mock/.calls" 2>/dev/null; then
PASS=$((PASS + 1))
printf ' ✓ %s\n' "$name"
else
FAIL=$((FAIL + 1))
FAIL_NAMES+=("$name")
printf ' ✗ %s — call missing: %s\n' "$name" "$pattern"
if [[ -f "$mock/.calls" ]]; then
printf ' .calls=\n'
sed 's/^/ | /' "$mock/.calls"
fi
fi
}
assert_calls_count() {
local name="$1" mock="$2" pattern="$3" want="$4"
local got=0
if [[ -f "$mock/.calls" ]]; then
got=$(grep -cE "$pattern" "$mock/.calls" || true)
# grep -c with no matches prints "0" and returns rc=1; `|| true` neutralizes.
got="${got%%[!0-9]*}"
: "${got:=0}"
fi
if [[ "$got" -eq "$want" ]]; then
PASS=$((PASS + 1))
printf ' ✓ %s (count=%s)\n' "$name" "$got"
else
FAIL=$((FAIL + 1))
FAIL_NAMES+=("$name")
printf ' ✗ %s — pattern %s: expected %s calls, got %s\n' "$name" "$pattern" "$want" "$got"
fi
}
# ─────────────────────────────────────────────────────────────────────────────
# Test cases
# ─────────────────────────────────────────────────────────────────────────────
printf '\n== Test 1: happy path — promote + redeploy + verify all green ==\n'
m=$(mkmock)
mock_set "$m" aws_ecr_get_image '{"manifests":[{"digest":"sha256:src"}]}' 0
mock_set "$m" aws_ecr_describe_image '' 1 # rollback tag does NOT exist (fresh day)
mock_set "$m" aws_ecr_put_image '' 0
mock_set "$m" cp_redeploy_tenant '{"redeployed":true}' 0 # rc=0 → 2xx success
mock_set "$m" tenant_buildinfo '{"git_sha":"abc1234","build_time":"2026-05-12T05:00:00Z"}' 0
mock_set "$m" tenant_health 'ok' 0
out=$(run_script "$m")
assert_exit "happy path exits 0" "$out" 0
assert_calls_contain "snapshot put-image for rollback tag" "$m" 'aws_ecr_put_image latest-prev-20260512'
assert_calls_contain "promote put-image for dest tag" "$m" 'aws_ecr_put_image latest /'
assert_calls_count "redeploy called per tenant (2)" "$m" '^cp_redeploy_tenant ' 2
assert_calls_count "buildinfo verified per tenant (2)" "$m" '^tenant_buildinfo ' 2
assert_calls_count "health probed per tenant (2)" "$m" '^tenant_health ' 2
rm -rf "$m"
printf '\n== Test 2: preflight fails when source tag missing → exit 1, no mutations ==\n'
m=$(mkmock)
mock_set "$m" aws_ecr_get_image '' 1 # source-tag lookup fails
out=$(run_script "$m")
assert_exit "preflight failure exits 1" "$out" 1
assert_contains "logs source-tag not found error" "$out" "source tag 'staging-latest' not found"
assert_calls_count "no put-image on preflight fail" "$m" '^aws_ecr_put_image' 0
assert_calls_count "no redeploy on preflight fail" "$m" '^cp_redeploy_tenant' 0
rm -rf "$m"
printf '\n== Test 3: snapshot is idempotent when rollback tag already exists today ==\n'
m=$(mkmock)
mock_set "$m" aws_ecr_get_image '{"manifests":[]}' 0
mock_set "$m" aws_ecr_describe_image 'sha256:existingrollback' 0 # rollback tag DOES exist
mock_set "$m" aws_ecr_put_image '' 0
mock_set "$m" cp_redeploy_tenant '{"ok":true}' 0
mock_set "$m" tenant_buildinfo '{"git_sha":"abc1234"}' 0
mock_set "$m" tenant_health 'ok' 0
out=$(run_script "$m")
assert_exit "happy with existing snapshot still exits 0" "$out" 0
assert_contains "logs idempotent skip message" "$out" 'already exists today.*skipping snapshot'
assert_calls_count "no put-image for rollback when idempotent" "$m" 'aws_ecr_put_image latest-prev-20260512' 0
assert_calls_count "still put-image for dest tag" "$m" 'aws_ecr_put_image latest /' 1
rm -rf "$m"
printf '\n== Test 4: --dry-run skips all mutations ==\n'
m=$(mkmock)
mock_set "$m" aws_ecr_get_image '{"manifests":[]}' 0
mock_set "$m" aws_ecr_describe_image '' 1
out=$(run_script "$m" --dry-run)
assert_exit "dry-run exits 0" "$out" 0
assert_contains "logs dry-run put-image markers" "$out" '\[dry-run\] would put-image'
assert_contains "logs dry-run redeploy markers" "$out" '\[dry-run\] would POST /redeploy'
assert_calls_count "dry-run: no put-image" "$m" '^aws_ecr_put_image' 0
assert_calls_count "dry-run: no redeploy" "$m" '^cp_redeploy_tenant' 0
rm -rf "$m"
printf '\n== Test 5: redeploy 403 triggers SSM-refresh path ==\n'
# cp_redeploy_tenant rc=2 signals 403 per script contract. Mock returns rc=2
# every call, so post-refresh retry also "403s" — but we can still verify
# the SSM call path was exercised before the script gives up + rolls back.
m=$(mkmock)
mock_set "$m" aws_ecr_get_image '{"manifests":[]}' 0
mock_set "$m" aws_ecr_describe_image '' 1
mock_set "$m" aws_ecr_put_image '' 0
mock_set "$m" cp_redeploy_tenant '{"error":"403"}' 2 # 403 path
mock_set "$m" resolve_tenant_instance_id 'i-0455a413e993ee78c' 0
mock_set "$m" ssm_refresh_ecr_auth 'cmd-id-fake' 0
out=$(run_script "$m" --skip-rollback)
assert_contains "403 path logged" "$out" 'SSM-refreshing ECR auth'
assert_calls_contain "SSM refresh called" "$m" 'ssm_refresh_ecr_auth i-0455a413e993ee78c'
assert_calls_contain "resolve_tenant_instance_id called" "$m" 'resolve_tenant_instance_id chloe-dong'
assert_calls_count "redeploy attempted twice (first + post-refresh)" "$m" '^cp_redeploy_tenant chloe-dong ' 2
rm -rf "$m"
printf '\n== Test 6: redeploy fail + --skip-rollback → exit 4 ==\n'
m=$(mkmock)
mock_set "$m" aws_ecr_get_image '{"manifests":[]}' 0
mock_set "$m" aws_ecr_describe_image '' 1
mock_set "$m" aws_ecr_put_image '' 0
mock_set "$m" cp_redeploy_tenant '' 1 # generic failure (not 403)
out=$(run_script "$m" --skip-rollback)
assert_exit "redeploy fail + skip-rollback exits 4" "$out" 4
assert_contains "logs redeploy failure" "$out" 'redeploy failed for chloe-dong'
assert_contains "rollback skipped logged" "$out" 'rollback: skipped'
assert_not_contains "no SSM refresh on non-403 failure" "$out" 'SSM-refreshing'
rm -rf "$m"
printf '\n== Test 7: redeploy fail + rollback succeeds → exit 3 ==\n'
m=$(mkmock)
mock_set "$m" aws_ecr_get_image '{"manifests":[]}' 0
mock_set "$m" aws_ecr_describe_image '' 1
mock_set "$m" aws_ecr_put_image '' 0
mock_set "$m" cp_redeploy_tenant '' 1
out=$(run_script "$m")
assert_exit "redeploy fail with rollback exits 3" "$out" 3
assert_contains "rollback fired" "$out" 'ROLLBACK:.*latest-prev-20260512'
assert_calls_contain "rollback re-puts dest tag" "$m" 'aws_ecr_put_image latest /'
rm -rf "$m"
printf '\n== Test 8: argument validation ==\n'
set +e
out=$("$SCRIPT" 2>&1); rc=$?
set -e
if [[ $rc -eq 64 ]] && printf '%s' "$out" | grep -q 'required:.*--source-tag'; then
PASS=$((PASS + 1)); printf ' ✓ exit 64 on missing args with usage line\n'
else
FAIL=$((FAIL + 1)); FAIL_NAMES+=("missing-args error")
printf ' ✗ exit 64 on missing args (got %s)\n' "$rc"
fi
set +e
out=$("$SCRIPT" --source-tag x --dest-tag x --tenants y 2>&1); rc=$?
set -e
if [[ $rc -eq 64 ]] && printf '%s' "$out" | grep -q 'must differ'; then
PASS=$((PASS + 1)); printf ' ✓ exit 64 when source==dest\n'
else
FAIL=$((FAIL + 1)); FAIL_NAMES+=("source==dest validation")
printf ' ✗ source==dest should fail (got %s)\n' "$rc"
fi
set +e
out=$("$SCRIPT" --source-tag x --dest-tag y --tenants t --bogus-flag 2>&1); rc=$?
set -e
if [[ $rc -eq 64 ]] && printf '%s' "$out" | grep -q 'unknown argument'; then
PASS=$((PASS + 1)); printf ' ✓ exit 64 on unknown flag\n'
else
FAIL=$((FAIL + 1)); FAIL_NAMES+=("unknown-flag error")
printf ' ✗ unknown-flag should fail (got %s)\n' "$rc"
fi
printf '\n== Test 9: ROLLBACK_TAG follows YYYYMMDD via NOW_OVERRIDE_DATE ==\n'
m=$(mkmock)
mock_set "$m" aws_ecr_get_image '{}' 0
mock_set "$m" aws_ecr_describe_image '' 1
mock_set "$m" aws_ecr_put_image '' 0
mock_set "$m" cp_redeploy_tenant '{}' 0
mock_set "$m" tenant_buildinfo '{}' 0
mock_set "$m" tenant_health 'ok' 0
set +e
NOW_OVERRIDE_DATE=20260603 SSM_SETTLE_SECONDS=0 "$SCRIPT" \
--source-tag a --dest-tag b --tenants t1 --mock-dir "$m" >/dev/null 2>&1
rc=$?
set -e
if [[ $rc -eq 0 ]]; then
PASS=$((PASS + 1)); printf ' ✓ run succeeded with custom NOW_OVERRIDE_DATE\n'
else
FAIL=$((FAIL + 1)); FAIL_NAMES+=("NOW_OVERRIDE_DATE run")
printf ' ✗ NOW_OVERRIDE_DATE run failed (rc=%s)\n' "$rc"
fi
assert_calls_contain "rollback tag uses NOW_OVERRIDE_DATE (20260603)" "$m" 'aws_ecr_put_image b-prev-20260603'
rm -rf "$m"
printf '\n== Test 10: empty source manifest fails preflight ==\n'
m=$(mkmock)
mock_set "$m" aws_ecr_get_image '' 0 # rc=0 but empty body (the "None" case)
out=$(run_script "$m")
assert_exit "empty source manifest fails preflight" "$out" 1
assert_contains "empty manifest message" "$out" 'returned empty manifest'
rm -rf "$m"
printf '\n== Test 11: tenant_buildinfo failure during verify → rollback ==\n'
m=$(mkmock)
mock_set "$m" aws_ecr_get_image '{"manifests":[]}' 0
mock_set "$m" aws_ecr_describe_image '' 1
mock_set "$m" aws_ecr_put_image '' 0
mock_set "$m" cp_redeploy_tenant '{"ok":true}' 0
mock_set "$m" tenant_buildinfo '' 1 # buildinfo probe fails
mock_set "$m" tenant_health 'ok' 0
out=$(run_script "$m")
assert_exit "verify failure → rollback succeeds → exit 3" "$out" 3
assert_contains "logs buildinfo failure" "$out" '/buildinfo failed for chloe-dong'
assert_contains "rollback fired after verify fail" "$out" 'ROLLBACK:'
rm -rf "$m"
printf '\n== Test 12: ssm_refresh_ecr_auth JSON escaping (CWE-78 / OFFSEC-001) ==\n'
# Verify the python3 snippet in ssm_refresh_ecr_auth produces valid JSON and
# correctly escapes shell-injection characters in region + account ID fields.
# The fix replaces unquoted shell-printf interpolation with json.dumps.
PYCODE='import json,sys;r=sys.argv[1];a=sys.argv[2];ecr="aws ecr get-login-password --region "+json.dumps(r)[1:-1]+" | docker login --username AWS --password-stdin "+json.dumps(a)[1:-1]+".dkr.ecr."+json.dumps(r)[1:-1]+".amazonaws.com";print(json.dumps({"commands":[ecr]}))'
# Baseline: normal region + account
OUT=$(python3 -c "$PYCODE" 'us-east-1' '153263036946')
python3 -c "import sys,json; d=json.loads(sys.stdin.read()); assert 'commands' in d; c=d['commands'][0]; assert 'us-east-1' in c and '153263036946' in c and c.startswith('aws ecr get-login-password')" <<< "$OUT" \
&& echo " ok: normal region+account" || { echo " FAIL: invalid JSON for normal case"; exit 1; }
# Injection: region with double-quote
OUT=$(python3 -c "$PYCODE" 'us"-east-1' '153263036946')
python3 -c "import sys,json; d=json.loads(sys.stdin.read()); c=d['commands'][0]; assert c" <<< "$OUT" \
&& echo " ok: region with quote injection → valid JSON" || { echo " FAIL"; exit 1; }
# Injection: account with double-quote
OUT=$(python3 -c "$PYCODE" 'us-east-1' '15"326"3036946')
python3 -c "import sys,json; d=json.loads(sys.stdin.read()); c=d['commands'][0]; assert c" <<< "$OUT" \
&& echo " ok: account with quote injection → valid JSON" || { echo " FAIL"; exit 1; }
# No double-encoding: region appears as literal 'us-east-1' in command string
OUT=$(python3 -c "$PYCODE" 'us-east-1' '153263036946')
python3 -c "import sys,json; d=json.loads(sys.stdin.read()); c=d['commands'][0]; assert 'us-east-1' in c" <<< "$OUT" \
&& echo " ok: no double-encoding in command string" || { echo " FAIL"; exit 1; }
# ─────────────────────────────────────────────────────────────────────────────
printf '\n────────────────────────────────────\n'
if [[ $FAIL -eq 0 ]]; then
printf 'All %d tests passed.\n' "$PASS"
exit 0
else
printf '%d passed, %d failed.\n' "$PASS" "$FAIL"
printf 'Failed tests:\n'
for n in "${FAIL_NAMES[@]}"; do printf ' - %s\n' "$n"; done
exit 1
fi
-361
View File
@@ -1,361 +0,0 @@
"""Tests for `.gitea/scripts/lint_bp_context_emit_match.py` — Tier 2f lint.
Structural enforcement of internal#350 Tier 2f: BP `status_check_contexts`
and the set of contexts emitted by `.gitea/workflows/*.yml` must agree.
Bidirectional rule:
(a) BP-only: every context in `branch_protections/<branch>.status_check_contexts`
must have at least one EMITTER — a workflow `name:` + job `name:` (or job key)
+ `pull_request` (or `push`) event that produces it. A BP context without
an emitter blocks merges forever (Gitea treats absent-as-pending, NOT
absent-as-skipped). This is the phantom-required-check class
(`feedback_phantom_required_check_after_gitea_migration`).
(b) EMITTER-only: NO automatic flag. The PR#656 case (workflow added a
sentinel context not yet in BP) is Tier 2g's job — a diff-based PR-time
lint. Tier 2f runs scheduled and would falsely flag every transitional
state during a BP rollout. We only flag the BP-empty case in this
direction as a NOTICE (informational), not as an error.
Tier 2f runs on a daily schedule + workflow_dispatch and files a
`[ci-bp-drift]`-tagged issue on mismatch.
Test classes (per `feedback_branch_count_before_approving`):
- test_perfect_match_passes — BP has [X]; workflows emit X.
Exit 0. No issue filed/edited.
- test_bp_orphan_context_fails — BP has [Y] but no workflow
emits Y. Exit 1. Issue body lists the orphan and the closest
candidate workflow names (Levenshtein-1 suggestion for typos).
- test_emitter_orphan_only_warns — workflow emits Z but BP
doesn't have it. Exit 0 with ::notice:: (NOT ::error::) because
Tier 2g handles this at PR time.
- test_multiple_orphans_aggregated — two BP orphans surfaced
together, not short-circuited.
- test_bp_empty_lints_nothing — BP has no contexts.
Exit 0 cleanly.
- test_api_403_skips_gracefully — branch_protections endpoint
403s (token-scope). Exit 0 with ::error::, do NOT red-X.
- test_api_404_skips_gracefully — branch has no protection.
Exit 0 cleanly.
- test_context_event_match_required — BP context says `(push)` and
workflow only emits on `pull_request`. That's NOT a match — the
BP-required gate would still wedge. Exit 1.
- test_workflow_event_mapping_pull_request_target — `pull_request_target`
in workflow `on:` emits a `(pull_request)` context (Gitea convention).
Match counts.
- test_idempotent_issue_filing — when an issue already exists
with the canonical title prefix, edit it instead of POSTing a new one
(idempotency contract — mirrors ci-required-drift).
Run:
python3 -m pytest tests/test_lint_bp_context_emit_match.py -v
"""
from __future__ import annotations
import importlib.util
import os
import sys
from pathlib import Path
from unittest import mock
import pytest
SCRIPT_PATH = (
Path(__file__).resolve().parent.parent
/ ".gitea"
/ "scripts"
/ "lint_bp_context_emit_match.py"
)
def _import_lint():
spec = importlib.util.spec_from_file_location(
f"lint_bp_emit_{os.getpid()}", SCRIPT_PATH
)
m = importlib.util.module_from_spec(spec)
spec.loader.exec_module(m)
return m
@pytest.fixture()
def envset(tmp_path, monkeypatch):
wf = tmp_path / ".gitea" / "workflows"
wf.mkdir(parents=True)
monkeypatch.setenv("WORKFLOWS_DIR", str(wf))
monkeypatch.setenv("GITEA_TOKEN", "stub")
monkeypatch.setenv("GITEA_HOST", "git.example.test")
monkeypatch.setenv("REPO", "owner/molecule-core")
monkeypatch.setenv("BRANCH", "main")
monkeypatch.setenv("DRIFT_LABEL", "ci-bp-drift")
return wf
def _write_wf(d: Path, name: str, content: str) -> Path:
p = d / name
p.write_text(content)
return p
def _stub_api(monkeypatch, lint_mod, bp_response, issue_search_response=None, posted_record=None):
"""Stub the module's `api` function.
bp_response: ("ok", {"status_check_contexts": [...]})
or ("forbidden", None) / ("not_found", None)
issue_search_response: list of issues matching the search query (
may be empty; default empty)
posted_record: dict in which to record any POST/PATCH calls made
(so tests can assert idempotency).
"""
if issue_search_response is None:
issue_search_response = []
if posted_record is None:
posted_record = {}
def fake_api(method, path, *, body=None, query=None):
if "branch_protections" in path:
return bp_response
if "issues/search" in path or "/issues?" in path or path.endswith("/issues"):
if method == "GET":
return ("ok", list(issue_search_response))
if method == "POST":
posted_record.setdefault("posts", []).append({"path": path, "body": body})
return ("ok", {"number": 9001, "html_url": "http://t/9001"})
if "/issues/" in path and method == "PATCH":
posted_record.setdefault("patches", []).append({"path": path, "body": body})
return ("ok", {"number": 9001})
if "/labels" in path:
return ("ok", [{"id": 10, "name": "ci-bp-drift"}, {"id": 9, "name": "tier:high"}])
return ("ok", {})
monkeypatch.setattr(lint_mod, "api", fake_api)
return posted_record
# ---------------------------------------------------------------------------
# Perfect match — both sides agree.
# ---------------------------------------------------------------------------
def test_perfect_match_passes(envset, monkeypatch, capsys):
_write_wf(
envset,
"ci.yml",
"name: CI\non:\n pull_request:\n branches: [main]\njobs:\n"
" all-required:\n runs-on: x\n steps:\n - run: echo hi\n",
)
m = _import_lint()
_stub_api(
monkeypatch,
m,
("ok", {"status_check_contexts": ["CI / all-required (pull_request)"]}),
)
rc = m.run()
assert rc == 0
# ---------------------------------------------------------------------------
# BP-only orphan — context with no emitter.
# ---------------------------------------------------------------------------
def test_bp_orphan_context_fails(envset, monkeypatch, capsys):
_write_wf(
envset,
"ci.yml",
"name: CI\non:\n pull_request:\n branches: [main]\njobs:\n"
" all-required:\n runs-on: x\n steps:\n - run: echo hi\n",
)
m = _import_lint()
posted = _stub_api(
monkeypatch,
m,
("ok", {"status_check_contexts": [
"CI / all-required (pull_request)",
"Ghost workflow / ghost (pull_request)", # the orphan
]}),
)
rc = m.run()
assert rc == 1
out = capsys.readouterr().out
assert "Ghost workflow" in out or "ghost" in out.lower()
# ---------------------------------------------------------------------------
# Emitter-only direction → notice, not error (Tier 2g territory).
# ---------------------------------------------------------------------------
def test_emitter_orphan_only_warns(envset, monkeypatch, capsys):
_write_wf(
envset,
"extra.yml",
"name: Extra\non:\n pull_request:\n branches: [main]\njobs:\n"
" extra-job:\n runs-on: x\n steps:\n - run: echo hi\n",
)
_write_wf(
envset,
"ci.yml",
"name: CI\non:\n pull_request:\n branches: [main]\njobs:\n"
" all-required:\n runs-on: x\n steps:\n - run: echo hi\n",
)
m = _import_lint()
_stub_api(
monkeypatch,
m,
("ok", {"status_check_contexts": ["CI / all-required (pull_request)"]}),
)
rc = m.run()
assert rc == 0
out = capsys.readouterr().out
assert "Extra" in out or "extra" in out
# ---------------------------------------------------------------------------
# Multiple BP orphans — all surfaced.
# ---------------------------------------------------------------------------
def test_multiple_orphans_aggregated(envset, monkeypatch, capsys):
_write_wf(
envset,
"ci.yml",
"name: CI\non:\n pull_request:\n branches: [main]\njobs:\n"
" all-required:\n runs-on: x\n steps:\n - run: echo hi\n",
)
m = _import_lint()
_stub_api(
monkeypatch,
m,
("ok", {"status_check_contexts": [
"CI / all-required (pull_request)",
"Phantom A / a (pull_request)",
"Phantom B / b (pull_request)",
]}),
)
rc = m.run()
assert rc == 1
out = capsys.readouterr().out
assert "Phantom A" in out and "Phantom B" in out
# ---------------------------------------------------------------------------
# BP has zero contexts → nothing to lint, pass.
# ---------------------------------------------------------------------------
def test_bp_empty_lints_nothing(envset, monkeypatch, capsys):
_write_wf(
envset,
"ci.yml",
"name: CI\non:\n pull_request:\n branches: [main]\njobs:\n"
" all-required:\n runs-on: x\n steps:\n - run: echo hi\n",
)
m = _import_lint()
_stub_api(monkeypatch, m, ("ok", {"status_check_contexts": []}))
rc = m.run()
assert rc == 0
# ---------------------------------------------------------------------------
# API 403 — graceful-degrade.
# ---------------------------------------------------------------------------
def test_api_403_skips_gracefully(envset, monkeypatch, capsys):
_write_wf(
envset,
"ci.yml",
"name: CI\non:\n pull_request:\n branches: [main]\njobs:\n"
" j:\n runs-on: x\n steps:\n - run: echo hi\n",
)
m = _import_lint()
_stub_api(monkeypatch, m, ("forbidden", None))
rc = m.run()
assert rc == 0
err = capsys.readouterr().err
assert "403" in err or "scope" in err.lower() or "token" in err.lower()
# ---------------------------------------------------------------------------
# API 404 — branch has no protection → clean exit.
# ---------------------------------------------------------------------------
def test_api_404_skips_gracefully(envset, monkeypatch, capsys):
_write_wf(
envset,
"ci.yml",
"name: CI\non:\n pull_request:\n branches: [main]\njobs:\n"
" j:\n runs-on: x\n steps:\n - run: echo hi\n",
)
m = _import_lint()
_stub_api(monkeypatch, m, ("not_found", None))
rc = m.run()
assert rc == 0
# ---------------------------------------------------------------------------
# Event-suffix match strict: BP says (push), workflow emits (pull_request)
# only. Mismatch — flag.
# ---------------------------------------------------------------------------
def test_context_event_match_required(envset, monkeypatch, capsys):
_write_wf(
envset,
"ci.yml",
"name: CI\non:\n pull_request:\n branches: [main]\njobs:\n"
" all-required:\n runs-on: x\n steps:\n - run: echo hi\n",
)
m = _import_lint()
_stub_api(
monkeypatch,
m,
("ok", {"status_check_contexts": ["CI / all-required (push)"]}),
)
rc = m.run()
assert rc == 1
# ---------------------------------------------------------------------------
# `pull_request_target` in workflow `on:` emits a `(pull_request)` context
# (Gitea convention — verified empirically on molecule-core).
# ---------------------------------------------------------------------------
def test_workflow_event_mapping_pull_request_target(envset, monkeypatch, capsys):
_write_wf(
envset,
"secret.yml",
"name: Secret scan\non:\n pull_request_target:\n branches: [main]\njobs:\n"
" scan:\n runs-on: x\n name: Scan diff for credential-shaped strings\n"
" steps:\n - run: echo hi\n",
)
m = _import_lint()
_stub_api(
monkeypatch,
m,
("ok", {"status_check_contexts": [
"Secret scan / Scan diff for credential-shaped strings (pull_request)",
]}),
)
rc = m.run()
assert rc == 0
# ---------------------------------------------------------------------------
# Idempotency — existing open issue is PATCHed, not duplicated.
# ---------------------------------------------------------------------------
def test_idempotent_issue_filing(envset, monkeypatch, capsys):
_write_wf(
envset,
"ci.yml",
"name: CI\non:\n pull_request:\n branches: [main]\njobs:\n"
" all-required:\n runs-on: x\n steps:\n - run: echo hi\n",
)
m = _import_lint()
posted = _stub_api(
monkeypatch,
m,
("ok", {"status_check_contexts": [
"CI / all-required (pull_request)",
"Ghost / g (pull_request)",
]}),
issue_search_response=[
{
"number": 4242,
"title": "[ci-bp-drift] owner/molecule-core/main: BP→emitter mismatch",
"state": "open",
"html_url": "http://t/4242",
}
],
)
rc = m.run()
assert rc == 1
# Should have PATCHed, not POSTed a new one.
assert posted.get("patches"), f"expected PATCH on existing issue; got {posted!r}"
assert not posted.get("posts"), f"expected no POSTs; got {posted!r}"
-88
View File
@@ -1,88 +0,0 @@
"""Tests for `.gitea/scripts/lint-curl-status-capture.py`.
Run:
python3 -m pytest tests/test_lint_curl_status_capture.py -v
"""
from __future__ import annotations
import importlib.util
from pathlib import Path
SCRIPT_PATH = (
Path(__file__).resolve().parent.parent
/ ".gitea"
/ "scripts"
/ "lint-curl-status-capture.py"
)
def _load_module():
spec = importlib.util.spec_from_file_location("lint_curl_status_capture", SCRIPT_PATH)
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
return module
def test_finds_quoted_echo_fallback_pollution():
lint = _load_module()
content = """
HTTP_CODE=$(curl -sS -o /tmp/body -w "%{http_code}" https://example.test || echo "000")
"""
findings = lint.scan_content("workflow.yml", content)
assert len(findings) == 1
assert "echo" in findings[0].snippet
def test_finds_unquoted_echo_fallback_pollution():
lint = _load_module()
content = """
HTTP_CODE=$(curl -sS -o /tmp/body -w '%{http_code}' https://example.test || echo 000)
"""
findings = lint.scan_content("workflow.yml", content)
assert len(findings) == 1
assert "echo" in findings[0].snippet
def test_finds_printf_fallback_pollution():
lint = _load_module()
content = """
HTTP_CODE=$(curl -sS -o /tmp/body -w '%{http_code}' https://example.test || printf '000')
"""
findings = lint.scan_content("workflow.yml", content)
assert len(findings) == 1
assert "printf" in findings[0].snippet
def test_ignores_tempfile_fallback_after_curl():
lint = _load_module()
content = """
set +e
curl -sS -o /tmp/body -w '%{http_code}' https://example.test >/tmp/code
rc=$?
set -e
HTTP_CODE=$(cat /tmp/code 2>/dev/null || echo "000")
[ -z "$HTTP_CODE" ] && HTTP_CODE="000"
"""
assert lint.scan_content("workflow.yml", content) == []
def test_collapses_bash_line_continuations():
lint = _load_module()
content = """
HTTP_CODE=$(curl -sS -o /tmp/body \\
-w "%{http_code}" \\
https://example.test \\
|| echo "000")
"""
findings = lint.scan_content("workflow.yml", content)
assert len(findings) == 1
@@ -1,430 +0,0 @@
"""Tests for `.gitea/scripts/lint_required_context_exists_in_bp.py` — Tier 2g lint.
Structural enforcement of internal#350 Tier 2g: when a PR adds a NEW
commit-status emission (a workflow's `name:` + a new job-key/name pair
that didn't exist on the base side), the PR must EITHER:
(a) Include a `# bp-required: yes` directive comment on the workflow
AND the new context must already be in
`branch_protections/<branch>.status_check_contexts`, OR
(b) Include a `# bp-required: pending #NNN` directive (acknowledged
asymmetry with a tracking issue), OR
(c) Include a `# bp-exempt: <reason>` directive (informational job,
not intended to be a required gate).
Default (no directive on a new emitter) = FAIL.
The class this prevents
-----------------------
PR#656 added `CI / all-required (pull_request)` as a sentinel context
that workflows emit, but BP did NOT list it — so when `platform-build`
failed, `all-required` failed, but BP let the PR merge anyway. Cascade
to mc#664. With Tier 2g, PR#656 would have been blocked until either
the BP PATCH ran alongside OR the author marked the emission with a
`bp-required: pending #NNN` directive.
Test classes (per `feedback_branch_count_before_approving`):
- test_no_new_emissions_skips — diff doesn't add any
new emitter; pass.
- test_new_emission_with_bp_required_yes_in_bp — directive set AND
BP lists the context; pass.
- test_new_emission_with_bp_required_yes_not_in_bp — directive set
BUT BP doesn't list; fail.
- test_new_emission_with_bp_required_pending — `# bp-required:
pending #800` directive references an open tracker; pass.
- test_new_emission_with_bp_exempt — `# bp-exempt:
informational` directive; pass.
- test_new_emission_no_directive_fails — no directive on a
new emission; fail with the 3-option fix-hint.
- test_modified_workflow_with_new_job_is_new — pre-existing
workflow gains a new job with a new name → counted as new
emission. Apply rule.
- test_modified_workflow_job_renamed_is_new — same workflow,
same job-key, but job `name:` changed → counted as new emission
(the OLD context name disappears; the NEW one needs validation).
- test_unrelated_workflow_edit_is_not_new — edit a comment in
an existing emitter; no new context introduced; pass.
- test_api_403_skips_gracefully — BP read 403; exit 0
with stderr ::error::.
- test_directive_must_be_in_workflow_yml — directive in PR
body alone is NOT sufficient; the comment must live in the
workflow file so future scheduled Tier 2f runs can see it.
Run:
python3 -m pytest tests/test_lint_required_context_exists_in_bp.py -v
"""
from __future__ import annotations
import importlib.util
import os
import subprocess
import sys
from pathlib import Path
from unittest import mock
import pytest
SCRIPT_PATH = (
Path(__file__).resolve().parent.parent
/ ".gitea"
/ "scripts"
/ "lint_required_context_exists_in_bp.py"
)
def _import_lint():
spec = importlib.util.spec_from_file_location(
f"lint_required_ctx_in_bp_{os.getpid()}", SCRIPT_PATH
)
m = importlib.util.module_from_spec(spec)
spec.loader.exec_module(m)
return m
# Sample workflows used across multiple tests.
WF_CI_BASE = """name: CI
on:
pull_request:
branches: [main]
jobs:
all-required:
runs-on: x
steps:
- run: echo hi
"""
# CI with a new job added.
WF_CI_NEW_JOB = """name: CI
on:
pull_request:
branches: [main]
jobs:
all-required:
runs-on: x
steps:
- run: echo hi
brand-new:
runs-on: x
steps:
- run: echo new
"""
WF_CI_NEW_JOB_BP_YES = """name: CI
on:
pull_request:
branches: [main]
jobs:
all-required:
runs-on: x
steps:
- run: echo hi
# bp-required: yes
brand-new:
runs-on: x
steps:
- run: echo new
"""
WF_CI_NEW_JOB_BP_PENDING = """name: CI
on:
pull_request:
branches: [main]
jobs:
all-required:
runs-on: x
steps:
- run: echo hi
# bp-required: pending #800
brand-new:
runs-on: x
steps:
- run: echo new
"""
WF_CI_NEW_JOB_BP_EXEMPT = """name: CI
on:
pull_request:
branches: [main]
jobs:
all-required:
runs-on: x
steps:
- run: echo hi
# bp-exempt: informational sticker, not a gate
brand-new:
runs-on: x
steps:
- run: echo new
"""
# Same WF, job rename only (CI/all-required → CI/sentinel).
WF_CI_JOB_RENAMED = """name: CI
on:
pull_request:
branches: [main]
jobs:
all-required:
runs-on: x
name: sentinel
steps:
- run: echo hi
"""
# Comment-only edit — should NOT count as new emission.
WF_CI_COMMENT_ONLY = """# a fresh comment line
name: CI
on:
pull_request:
branches: [main]
jobs:
all-required:
runs-on: x
steps:
- run: echo hi
"""
def _stub_git_and_api(
monkeypatch,
lint_mod,
base_files: dict[str, str | None],
head_files: dict[str, str | None],
bp_response,
):
"""Stub `subprocess.run` for git, and `lint_mod.api` for HTTP."""
def fake_run(cmd, *args, **kwargs):
if not isinstance(cmd, list):
raise AssertionError(f"unexpected cmd: {cmd!r}")
if cmd[:2] == ["git", "show"] and ":" in cmd[2]:
sha, path = cmd[2].split(":", 1)
side = base_files if "base" in sha else head_files
content = side.get(path)
if content is None:
return subprocess.CompletedProcess(cmd, 128, "", "fatal: path not in tree")
return subprocess.CompletedProcess(cmd, 0, content, "")
if cmd[:2] == ["git", "diff"]:
# Names of files that changed (any side has differing contents
# from the other, or only appears on one side).
all_paths = set(base_files) | set(head_files)
changed = sorted(p for p in all_paths if base_files.get(p) != head_files.get(p))
return subprocess.CompletedProcess(cmd, 0, "\n".join(changed) + "\n", "")
raise AssertionError(f"unexpected cmd: {cmd!r}")
monkeypatch.setattr(subprocess, "run", fake_run)
def fake_api(method, path, *, body=None, query=None):
if "branch_protections" in path:
return bp_response
return ("ok", {})
monkeypatch.setattr(lint_mod, "api", fake_api)
@pytest.fixture()
def env(monkeypatch):
monkeypatch.setenv("BASE_SHA", "base-x")
monkeypatch.setenv("HEAD_SHA", "head-x")
monkeypatch.setenv("GITEA_TOKEN", "stub")
monkeypatch.setenv("GITEA_HOST", "git.example.test")
monkeypatch.setenv("REPO", "owner/molecule-core")
monkeypatch.setenv("BRANCH", "main")
monkeypatch.setenv("WORKFLOWS_DIR", ".gitea/workflows")
return monkeypatch
# ---------------------------------------------------------------------------
# No new emissions — pass.
# ---------------------------------------------------------------------------
def test_no_new_emissions_skips(env, monkeypatch, capsys):
m = _import_lint()
_stub_git_and_api(
monkeypatch,
m,
base_files={".gitea/workflows/ci.yml": WF_CI_BASE},
head_files={".gitea/workflows/ci.yml": WF_CI_BASE},
bp_response=("ok", {"status_check_contexts": []}),
)
rc = m.run()
assert rc == 0
# ---------------------------------------------------------------------------
# New emission + bp-required: yes + in BP → pass.
# ---------------------------------------------------------------------------
def test_new_emission_with_bp_required_yes_in_bp(env, monkeypatch, capsys):
m = _import_lint()
_stub_git_and_api(
monkeypatch,
m,
base_files={".gitea/workflows/ci.yml": WF_CI_BASE},
head_files={".gitea/workflows/ci.yml": WF_CI_NEW_JOB_BP_YES},
bp_response=(
"ok",
{"status_check_contexts": ["CI / brand-new (pull_request)"]},
),
)
rc = m.run()
assert rc == 0
# ---------------------------------------------------------------------------
# bp-required: yes but NOT in BP → fail.
# ---------------------------------------------------------------------------
def test_new_emission_with_bp_required_yes_not_in_bp(env, monkeypatch, capsys):
m = _import_lint()
_stub_git_and_api(
monkeypatch,
m,
base_files={".gitea/workflows/ci.yml": WF_CI_BASE},
head_files={".gitea/workflows/ci.yml": WF_CI_NEW_JOB_BP_YES},
bp_response=("ok", {"status_check_contexts": []}),
)
rc = m.run()
assert rc == 1
out = capsys.readouterr().out
assert "brand-new" in out
# ---------------------------------------------------------------------------
# bp-required: pending #NNN → pass.
# ---------------------------------------------------------------------------
def test_new_emission_with_bp_required_pending(env, monkeypatch, capsys):
m = _import_lint()
_stub_git_and_api(
monkeypatch,
m,
base_files={".gitea/workflows/ci.yml": WF_CI_BASE},
head_files={".gitea/workflows/ci.yml": WF_CI_NEW_JOB_BP_PENDING},
bp_response=("ok", {"status_check_contexts": []}),
)
rc = m.run()
assert rc == 0
# ---------------------------------------------------------------------------
# bp-exempt → pass.
# ---------------------------------------------------------------------------
def test_new_emission_with_bp_exempt(env, monkeypatch, capsys):
m = _import_lint()
_stub_git_and_api(
monkeypatch,
m,
base_files={".gitea/workflows/ci.yml": WF_CI_BASE},
head_files={".gitea/workflows/ci.yml": WF_CI_NEW_JOB_BP_EXEMPT},
bp_response=("ok", {"status_check_contexts": []}),
)
rc = m.run()
assert rc == 0
# ---------------------------------------------------------------------------
# New emission, no directive → fail with 3-option fix hint.
# ---------------------------------------------------------------------------
def test_new_emission_no_directive_fails(env, monkeypatch, capsys):
m = _import_lint()
_stub_git_and_api(
monkeypatch,
m,
base_files={".gitea/workflows/ci.yml": WF_CI_BASE},
head_files={".gitea/workflows/ci.yml": WF_CI_NEW_JOB},
bp_response=("ok", {"status_check_contexts": []}),
)
rc = m.run()
assert rc == 1
out = capsys.readouterr().out
assert "brand-new" in out
assert "bp-required" in out
assert "bp-exempt" in out
# ---------------------------------------------------------------------------
# Pre-existing workflow gains a new job → counted as new emission.
# ---------------------------------------------------------------------------
def test_modified_workflow_with_new_job_is_new(env, monkeypatch, capsys):
m = _import_lint()
_stub_git_and_api(
monkeypatch,
m,
base_files={".gitea/workflows/ci.yml": WF_CI_BASE},
head_files={".gitea/workflows/ci.yml": WF_CI_NEW_JOB},
bp_response=("ok", {"status_check_contexts": []}),
)
rc = m.run()
# No directive → fail
assert rc == 1
# ---------------------------------------------------------------------------
# Same workflow, same job-key, but job `name:` changed → new context.
# ---------------------------------------------------------------------------
def test_modified_workflow_job_renamed_is_new(env, monkeypatch, capsys):
m = _import_lint()
_stub_git_and_api(
monkeypatch,
m,
base_files={".gitea/workflows/ci.yml": WF_CI_BASE},
head_files={".gitea/workflows/ci.yml": WF_CI_JOB_RENAMED},
bp_response=("ok", {"status_check_contexts": []}),
)
rc = m.run()
assert rc == 1
out = capsys.readouterr().out
assert "sentinel" in out
# ---------------------------------------------------------------------------
# Comment-only edit → no new emission.
# ---------------------------------------------------------------------------
def test_unrelated_workflow_edit_is_not_new(env, monkeypatch, capsys):
m = _import_lint()
_stub_git_and_api(
monkeypatch,
m,
base_files={".gitea/workflows/ci.yml": WF_CI_BASE},
head_files={".gitea/workflows/ci.yml": WF_CI_COMMENT_ONLY},
bp_response=("ok", {"status_check_contexts": []}),
)
rc = m.run()
assert rc == 0
# ---------------------------------------------------------------------------
# BP API 403 → exit 0 with ::error::.
# ---------------------------------------------------------------------------
def test_api_403_skips_gracefully(env, monkeypatch, capsys):
m = _import_lint()
_stub_git_and_api(
monkeypatch,
m,
base_files={".gitea/workflows/ci.yml": WF_CI_BASE},
head_files={".gitea/workflows/ci.yml": WF_CI_NEW_JOB},
bp_response=("forbidden", None),
)
rc = m.run()
assert rc == 0
err = capsys.readouterr().err
assert "403" in err or "scope" in err.lower() or "token" in err.lower()
# ---------------------------------------------------------------------------
# Directive must be in the workflow YML, not PR body.
# ---------------------------------------------------------------------------
def test_directive_must_be_in_workflow_yml(env, monkeypatch, capsys):
monkeypatch = env
monkeypatch.setenv("PR_BODY", "bp-required: yes — see comment above")
m = _import_lint()
_stub_git_and_api(
monkeypatch,
m,
base_files={".gitea/workflows/ci.yml": WF_CI_BASE},
head_files={".gitea/workflows/ci.yml": WF_CI_NEW_JOB},
bp_response=("ok", {"status_check_contexts": []}),
)
rc = m.run()
# Even though PR body claims, the workflow itself lacks the directive.
assert rc == 1
+1 -16
View File
@@ -35,22 +35,7 @@ RUN CGO_ENABLED=0 GOOS=linux go build \
-o /memory-plugin ./cmd/memory-plugin-postgres
FROM alpine:3.20@sha256:c64c687cbea9300178b30c95835354e34c4e4febc4badfe27102879de0483b5e
# docker-cli is required by internal/provisioner/localbuild.go which
# shells out via exec.Command("docker", "image", "inspect"/"build"/"tag", ...)
# whenever Resolve().Mode == RegistryModeLocal — which is the permanent
# mode post-2026-05-06 (Molecule-AI GitHub org suspended → GHCR
# unreachable → MOLECULE_IMAGE_REGISTRY unset → registry_mode.go falls
# through to RegistryModeLocal). Without docker-cli here the platform
# fails every workspace re-provision with `local-build: image inspect
# for molecule-local/workspace-template-<runtime>:<sha> failed
# (exec: "docker": executable file not found in $PATH)` and the
# workspace stays status=failed. The Docker SOCKET is already mounted
# (entrypoint.sh adds the platform user to the docker group) — only
# the CLI binary was missing. Caught after sdk-lead + CP-QA went down
# this way during the MiniMax-switch attempt + after-Class-A audit.
# Related: Task #194 / Issue #63 (local-build path added);
# `feedback_workspace_image_ghcr_dead`.
RUN apk add --no-cache ca-certificates docker-cli git tzdata wget
RUN apk add --no-cache ca-certificates git tzdata wget
COPY --from=builder /platform /platform
COPY --from=builder /memory-plugin /memory-plugin
COPY workspace-server/migrations /migrations
@@ -501,18 +501,8 @@ func (h *WorkspaceHandler) proxyA2ARequest(ctx context.Context, workspaceID stri
// to correctly route delivery-confirmed responses (where the agent completed
// the work but the TCP connection dropped before the full body was received)
// to success instead of failure (#159).
//
// For non-2xx responses (server explicitly rejected with 3xx+), preserve
// resp.StatusCode in the proxyA2AError.Status so isTransientProxyError
// returns false — a server-authored rejection is not a transient transport
// error and must not be retried. Only 2xx body-read errors keep Status=502
// (the agent completed work but the TCP layer dropped the response).
errStatus := http.StatusBadGateway
if resp.StatusCode >= 300 {
errStatus = resp.StatusCode
}
return resp.StatusCode, respBody, &proxyA2AError{
Status: errStatus,
Status: http.StatusBadGateway,
Response: gin.H{
"error": "failed to read agent response",
"delivery_confirmed": deliveryConfirmed,
@@ -520,21 +510,6 @@ func (h *WorkspaceHandler) proxyA2ARequest(ctx context.Context, workspaceID stri
}
}
// 2xx with empty body: the agent completed the request but returned no content.
// An A2A agent must always return a JSON body; empty means the agent is
// broken or the connection closed before any body bytes were written.
// Return a proxyA2AError so executeDelegation routes this to failure rather
// than silently marking it as completed with a nil body.
// logA2ASuccess is intentionally NOT called here — delivery was not confirmed.
if resp.StatusCode >= 200 && resp.StatusCode < 300 && len(respBody) == 0 {
log.Printf("ProxyA2A: agent %s returned %d with empty body — treating as failure",
workspaceID, resp.StatusCode)
return resp.StatusCode, respBody, &proxyA2AError{
Status: resp.StatusCode,
Response: gin.H{"error": "agent returned empty response body"},
}
}
if logActivity {
h.logA2ASuccess(ctx, workspaceID, callerID, body, respBody, a2aMethod, resp.StatusCode, durationMs)
}
@@ -410,7 +410,7 @@ func extractToolTrace(respBody []byte) json.RawMessage {
return nil
}
trace, ok := meta["tool_trace"]
if !ok || len(trace) == 0 || string(trace) == "null" || string(trace) == "[]" {
if !ok || len(trace) == 0 {
return nil
}
return trace
@@ -1,243 +0,0 @@
package handlers
import (
"encoding/json"
"testing"
)
// ─────────────────────────────────────────────────────────────────────────────
// nilIfEmpty tests
// ─────────────────────────────────────────────────────────────────────────────
func TestNilIfEmpty_EmptyString(t *testing.T) {
got := nilIfEmpty("")
if got != nil {
t.Errorf("empty string: got %p, want nil", got)
}
}
func TestNilIfEmpty_NonEmptyString(t *testing.T) {
s := "hello"
got := nilIfEmpty(s)
if got == nil {
t.Fatal("non-empty string: got nil, want pointer")
}
if *got != "hello" {
t.Errorf("non-empty string: got %q, want %q", *got, "hello")
}
}
// ─────────────────────────────────────────────────────────────────────────────
// extractToolTrace tests
// ─────────────────────────────────────────────────────────────────────────────
func TestExtractToolTrace_EmptyBody(t *testing.T) {
got := extractToolTrace(nil)
if got != nil {
t.Errorf("nil body: got %v, want nil", got)
}
got = extractToolTrace([]byte{})
if got != nil {
t.Errorf("empty body: got %v, want nil", got)
}
}
func TestExtractToolTrace_InvalidJSON(t *testing.T) {
got := extractToolTrace([]byte("not json"))
if got != nil {
t.Errorf("invalid JSON: got %v, want nil", got)
}
}
func TestExtractToolTrace_NoResultKey(t *testing.T) {
got := extractToolTrace([]byte(`{"error": "oops"}`))
if got != nil {
t.Errorf("no result key: got %v, want nil", got)
}
}
func TestExtractToolTrace_NoMetadataKey(t *testing.T) {
got := extractToolTrace([]byte(`{"result": {"data": {}}}`))
if got != nil {
t.Errorf("no metadata key: got %v, want nil", got)
}
}
func TestExtractToolTrace_NoToolTraceKey(t *testing.T) {
got := extractToolTrace([]byte(`{"result": {"metadata": {}}}`))
if got != nil {
t.Errorf("no tool_trace key: got %v, want nil", got)
}
}
// extractToolTrace calls json.Unmarshal, which sets a RawMessage to nil when
// unmarshaling a JSON null value. The fix for mc#669 changes len(trace)==0
// to string(trace)=="[]" to avoid len(nil) panicking on null.
func TestExtractToolTrace_NullValue(t *testing.T) {
// JSON null in tool_trace → RawMessage becomes nil → len would panic.
// The fix checks string(trace)=="[]" which is safe on nil (returns false).
body := []byte(`{"result": {"metadata": {"tool_trace": null}}}`)
got := extractToolTrace(body)
if got != nil {
t.Errorf("null tool_trace: got %v, want nil", got)
}
}
// "[]" unmarshaled into RawMessage is []byte("[]") — not nil, len=2.
// The fix returns nil for [] so empty tool_trace arrays don't surface as traces.
func TestExtractToolTrace_EmptyArray(t *testing.T) {
body := []byte(`{"result": {"metadata": {"tool_trace": []}}}`)
got := extractToolTrace(body)
if got != nil {
t.Errorf("empty array tool_trace: got %v, want nil", got)
}
}
func TestExtractToolTrace_ValidNonEmpty(t *testing.T) {
trace := []byte(`[{"name":"search","result":"done"}]`)
body, _ := json.Marshal(map[string]interface{}{
"result": map[string]interface{}{
"metadata": map[string]interface{}{
"tool_trace": json.RawMessage(trace),
},
},
})
got := extractToolTrace(body)
if got == nil {
t.Fatal("valid non-empty trace: got nil, want the trace")
}
if string(got) != string(trace) {
t.Errorf("valid trace: got %s, want %s", got, trace)
}
}
// Document that the CURRENT code (len check) panics on null tool_trace.
// This test exists to signal when PR #669's fix lands: after the fix,
// the defer-recover will NOT trigger (panic goes away) and the
// post-recover assertion runs. While unfixed: the panic fires and
// ─────────────────────────────────────────────────────────────────────────────
// readUsageMap tests
// ─────────────────────────────────────────────────────────────────────────────
func TestReadUsageMap_NoUsageKey(t *testing.T) {
m := map[string]json.RawMessage{}
_, _, ok := readUsageMap(m)
if ok {
t.Error("no usage key: ok should be false")
}
}
func TestReadUsageMap_InvalidUsageJSON(t *testing.T) {
m := map[string]json.RawMessage{"usage": json.RawMessage(`"not an object"`)}
_, _, ok := readUsageMap(m)
if ok {
t.Error("invalid usage JSON: ok should be false")
}
}
func TestReadUsageMap_ZeroUsage(t *testing.T) {
m := map[string]json.RawMessage{"usage": json.RawMessage(`{"input_tokens": 0, "output_tokens": 0}`)}
_, _, ok := readUsageMap(m)
if ok {
t.Error("zero usage: ok should be false")
}
}
func TestReadUsageMap_InputOnly(t *testing.T) {
m := map[string]json.RawMessage{"usage": json.RawMessage(`{"input_tokens": 100, "output_tokens": 0}`)}
in, out, ok := readUsageMap(m)
if !ok {
t.Fatal("input-only usage: ok should be true")
}
if in != 100 {
t.Errorf("input tokens: got %d, want 100", in)
}
if out != 0 {
t.Errorf("output tokens: got %d, want 0", out)
}
}
func TestReadUsageMap_BothTokens(t *testing.T) {
m := map[string]json.RawMessage{"usage": json.RawMessage(`{"input_tokens": 500, "output_tokens": 200}`)}
in, out, ok := readUsageMap(m)
if !ok {
t.Fatal("both tokens: ok should be true")
}
if in != 500 || out != 200 {
t.Errorf("tokens: got (%d, %d), want (500, 200)", in, out)
}
}
// ─────────────────────────────────────────────────────────────────────────────
// parseUsageFromA2AResponse tests
// ─────────────────────────────────────────────────────────────────────────────
func TestParseUsageFromA2AResponse_Empty(t *testing.T) {
in, out := parseUsageFromA2AResponse(nil)
if in != 0 || out != 0 {
t.Errorf("nil: got (%d, %d), want (0, 0)", in, out)
}
in, out = parseUsageFromA2AResponse([]byte{})
if in != 0 || out != 0 {
t.Errorf("empty: got (%d, %d), want (0, 0)", in, out)
}
}
func TestParseUsageFromA2AResponse_InvalidJSON(t *testing.T) {
in, out := parseUsageFromA2AResponse([]byte("not json"))
if in != 0 || out != 0 {
t.Errorf("invalid JSON: got (%d, %d), want (0, 0)", in, out)
}
}
func TestParseUsageFromA2AResponse_NoResultNoUsage(t *testing.T) {
in, out := parseUsageFromA2AResponse([]byte(`{"id": 1}`))
if in != 0 || out != 0 {
t.Errorf("no result/usage: got (%d, %d), want (0, 0)", in, out)
}
}
func TestParseUsageFromA2AResponse_ResultUsage(t *testing.T) {
body := []byte(`{"result": {"usage": {"input_tokens": 42, "output_tokens": 7}}}`)
in, out := parseUsageFromA2AResponse(body)
if in != 42 || out != 7 {
t.Errorf("result usage: got (%d, %d), want (42, 7)", in, out)
}
}
func TestParseUsageFromA2AResponse_ResultUsageWinsOverTopLevel(t *testing.T) {
// JSON-RPC result.usage takes precedence over top-level usage.
body := []byte(`{"result": {"usage": {"input_tokens": 42, "output_tokens": 7}}, "usage": {"input_tokens": 99, "output_tokens": 99}}`)
in, out := parseUsageFromA2AResponse(body)
if in != 42 || out != 7 {
t.Errorf("result usage should win: got (%d, %d), want (42, 7)", in, out)
}
}
func TestParseUsageFromA2AResponse_TopLevelFallback(t *testing.T) {
// Direct (non-JSON-RPC) response: usage at top level.
body := []byte(`{"usage": {"input_tokens": 11, "output_tokens": 13}}`)
in, out := parseUsageFromA2AResponse(body)
if in != 11 || out != 13 {
t.Errorf("top-level usage: got (%d, %d), want (11, 13)", in, out)
}
}
func TestParseUsageFromA2AResponse_ZeroValuesInResult(t *testing.T) {
// Zero usage in result.result.usage: returns (0, 0) — no panic.
body := []byte(`{"result": {"usage": {"input_tokens": 0, "output_tokens": 0}}}`)
in, out := parseUsageFromA2AResponse(body)
if in != 0 || out != 0 {
t.Errorf("zero usage: got (%d, %d), want (0, 0)", in, out)
}
}
func TestParseUsageFromA2AResponse_MissingTokensInUsageObject(t *testing.T) {
// usage object exists but tokens are absent — returns (0, 0).
body := []byte(`{"result": {"usage": {"other_field": 5}}}`)
in, out := parseUsageFromA2AResponse(body)
if in != 0 || out != 0 {
t.Errorf("missing tokens: got (%d, %d), want (0, 0)", in, out)
}
}
@@ -7,7 +7,6 @@ import (
"go/parser"
"go/token"
"testing"
"time"
"github.com/DATA-DOG/go-sqlmock"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/models"
@@ -72,8 +71,6 @@ func TestPreflight_ContainerRunning_ReturnsNil(t *testing.T) {
// triggers the offline-flip + WORKSPACE_OFFLINE broadcast + async restart.
// This is the load-bearing case — saves the caller 2-30s of network timeout.
func TestPreflight_ContainerNotRunning_StructuredFastFail(t *testing.T) {
const wsID = "ws-dead-456"
resetRestartStatesFor(wsID)
mock := setupTestDB(t)
_ = setupTestRedis(t)
stub := &preflightLocalProv{running: false, err: nil}
@@ -82,14 +79,14 @@ func TestPreflight_ContainerNotRunning_StructuredFastFail(t *testing.T) {
// Expect the offline-flip UPDATE.
mock.ExpectExec(`UPDATE workspaces SET status =`).
WithArgs(models.StatusOffline, wsID).
WithArgs(models.StatusOffline, "ws-dead-456").
WillReturnResult(sqlmock.NewResult(0, 1))
// Broadcaster's INSERT INTO structure_events fires too — best-effort
// log entry for the WORKSPACE_OFFLINE event. Match permissively.
mock.ExpectExec(`INSERT INTO structure_events`).
WillReturnResult(sqlmock.NewResult(0, 1))
proxyErr := h.preflightContainerHealth(context.Background(), wsID)
proxyErr := h.preflightContainerHealth(context.Background(), "ws-dead-456")
if proxyErr == nil {
t.Fatal("preflight should return *proxyA2AError when container not running")
}
@@ -110,32 +107,6 @@ func TestPreflight_ContainerNotRunning_StructuredFastFail(t *testing.T) {
// h.broadcaster.RecordAndBroadcast call but not asserted here — the
// real *events.Broadcaster doesn't expose received events for inspection.
// The DB UPDATE expectation is sufficient to pin the offline-flip path.
waitRestartByIDGoroutineIdle(t, wsID)
}
func waitRestartByIDGoroutineIdle(t *testing.T, wsID string) {
t.Helper()
deadline := time.Now().Add(2 * time.Second)
sawState := false
for time.Now().Before(deadline) {
sv, ok := restartStates.Load(wsID)
if ok {
sawState = true
st := sv.(*restartState)
st.mu.Lock()
running := st.running
st.mu.Unlock()
if !running {
resetRestartStatesFor(wsID)
return
}
}
time.Sleep(time.Millisecond)
}
if !sawState {
t.Fatalf("preflight did not start RestartByID goroutine for %s", wsID)
}
t.Fatalf("RestartByID goroutine for %s did not drain before test cleanup", wsID)
}
// TestPreflight_TransientError_FailsSoftAsAlive — IsRunning(true,err): the
@@ -6,7 +6,6 @@ import (
"log"
"net/http"
"os"
"runtime"
"time"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/db"
@@ -163,7 +162,7 @@ func (h *DelegationHandler) Delegate(c *gin.Context) {
})
// Fire-and-forget: send A2A in background goroutine
go h.executeDelegation(ctx, sourceID, body.TargetID, delegationID, a2aBody)
go h.executeDelegation(sourceID, body.TargetID, delegationID, a2aBody)
// Broadcast event so canvas shows delegation in real-time
h.broadcaster.RecordAndBroadcast(ctx, string(events.EventDelegationSent), sourceID, map[string]interface{}{
@@ -309,50 +308,21 @@ func insertDelegationRow(ctx context.Context, c *gin.Context, sourceID string, b
// to land a fresh URL in the cache before we try again. Fixes #74 —
// bulk restarts used to produce spurious "failed to reach workspace
// agent" errors when delegations fired within the warm-up window.
var delegationRetryDelay = 8 * time.Second
const delegationRetryDelay = 8 * time.Second
// NB: the log.Printf calls below are load-bearing for the integration test
// surface (delegation_executor_integration_test.go). The test uses a raw TCP
// mock server; without these calls the compiler inlines executeDelegation and
// a subtle stack-sharing race between the inlined body and the test goroutine
// causes the test to hang. The log calls prevent inlining (Go cannot inline
// functions that call the log package). This is a known Go compiler behaviour.
// runtime.LockOSThread() provides an additional hardening: pinning the
// goroutine to a single OS thread eliminates any scheduler-migration races.
// The caller provides ctx (which carries the deadline/budget); no internal
// context.WithTimeout is created here.
// executeDelegation runs the A2A dispatch for a delegation. ctx controls the
// entire lifecycle: its timeout bounds all DB ops, proxy calls, and retries.
// Pass context.Background() when no external deadline applies (e.g. tests).
func (h *DelegationHandler) executeDelegation(ctx context.Context, sourceID, targetID, delegationID string, a2aBody []byte) {
runtime.LockOSThread() // pin to thread; prevents scheduler-migration races in integration tests
func (h *DelegationHandler) executeDelegation(sourceID, targetID, delegationID string, a2aBody []byte) {
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Minute)
defer cancel()
log.Printf("Delegation %s: %s → %s (dispatched)", delegationID, sourceID, targetID)
log.Printf("Delegation %s: step=updating_dispatched_status", delegationID)
// Update status: pending → dispatched
h.updateDelegationStatus(ctx, sourceID, delegationID, "dispatched", "")
log.Printf("Delegation %s: step=broadcasting_dispatched", delegationID)
h.updateDelegationStatus(sourceID, delegationID, "dispatched", "")
h.broadcaster.RecordAndBroadcast(ctx, string(events.EventDelegationStatus), sourceID, map[string]interface{}{
"delegation_id": delegationID, "target_id": targetID, "status": "dispatched",
})
log.Printf("Delegation %s: step=proxying_a2a_request", delegationID)
status, respBody, proxyErr := h.workspace.proxyA2ARequest(ctx, targetID, a2aBody, sourceID, true)
log.Printf("Delegation %s: step=proxy_done status=%d bodyLen=%d err=%v", delegationID, status, len(respBody), proxyErr)
// When proxyA2ARequest returns an error but we have a non-empty response body
// with a 2xx status code, the agent completed the work successfully — the error
// is a delivery/transport error (e.g., connection reset after response was
// received). Treat as success: the response body is valid and the work is done.
// This check MUST run before the transient-retry gate so a delivery-confirmed
// partial-body 2xx response is never retried.
if isDeliveryConfirmedSuccess(proxyErr, status, respBody) {
log.Printf("Delegation %s: completed with delivery error (status=%d, respBody=%d bytes, proxyErr=%v) — treating as success",
delegationID, status, len(respBody), proxyErr.Error())
goto handleSuccess
}
// #74: one retry after the reactive URL refresh has had a chance to
// run. The proxyA2ARequest's health-check path on a connection error
@@ -372,10 +342,21 @@ func (h *DelegationHandler) executeDelegation(ctx context.Context, sourceID, tar
}
}
// When proxyA2ARequest returns an error but we have a non-empty response body
// with a 2xx status code, the agent completed the work successfully — the error
// is a delivery/transport error (e.g., connection reset after response was
// received). Treat as success: the response body is valid and the work is done.
// This prevents "retry storms" where the canvas sees error + Restart-workspace
// suggestion even though the delegation actually completed.
if isDeliveryConfirmedSuccess(proxyErr, status, respBody) {
log.Printf("Delegation %s: completed with delivery error (status=%d, respBody=%d bytes, proxyErr=%v) — treating as success",
delegationID, status, len(respBody), proxyErr.Error())
goto handleSuccess
}
if proxyErr != nil {
log.Printf("Delegation %s: step=handling_failure err=%v", delegationID, proxyErr)
log.Printf("Delegation %s: failed — %s", delegationID, proxyErr.Error())
h.updateDelegationStatus(ctx, sourceID, delegationID, "failed", proxyErr.Error())
h.updateDelegationStatus(sourceID, delegationID, "failed", proxyErr.Error())
if _, err := db.DB.ExecContext(ctx, `
INSERT INTO activity_logs (workspace_id, activity_type, method, source_id, target_id, summary, status, error_detail)
@@ -392,27 +373,7 @@ func (h *DelegationHandler) executeDelegation(ctx context.Context, sourceID, tar
return
}
if status >= 200 && status < 300 && len(respBody) == 0 {
errMsg := "workspace agent returned empty response"
log.Printf("Delegation %s: step=handling_failure err=%s", delegationID, errMsg)
h.updateDelegationStatus(ctx, sourceID, delegationID, "failed", errMsg)
if _, err := db.DB.ExecContext(ctx, `
INSERT INTO activity_logs (workspace_id, activity_type, method, source_id, target_id, summary, status, error_detail)
VALUES ($1, 'delegation', 'delegate_result', $2, $3, $4, 'failed', $5)
`, sourceID, sourceID, targetID, "Delegation failed", errMsg); err != nil {
log.Printf("Delegation %s: failed to insert empty-response error log: %v", delegationID, err)
}
h.broadcaster.RecordAndBroadcast(ctx, string(events.EventDelegationFailed), sourceID, map[string]interface{}{
"delegation_id": delegationID, "target_id": targetID, "error": errMsg,
})
pushDelegationResultToInbox(ctx, sourceID, delegationID, "failed", "", errMsg)
return
}
handleSuccess:
log.Printf("Delegation %s: step=handle_success status=%d", delegationID, status)
// 202 + {queued: true} means the target was busy and the proxy
// enqueued the request for the next drain tick — NOT a completion.
@@ -426,7 +387,7 @@ handleSuccess:
// the user.
if status == http.StatusAccepted && isQueuedProxyResponse(respBody) {
log.Printf("Delegation %s: target %s busy — queued for drain", delegationID, targetID)
h.updateDelegationStatus(ctx, sourceID, delegationID, "queued", "")
h.updateDelegationStatus(sourceID, delegationID, "queued", "")
// Store delegation_id in response_body so DrainQueueForWorkspace's
// stitch step can find this row by JSON-path key after the queued
// dispatch eventually succeeds. Without the key, the drain finds
@@ -453,7 +414,6 @@ handleSuccess:
responseText := extractResponseText(respBody)
log.Printf("Delegation %s: completed (status=%d, %d chars)", delegationID, status, len(responseText))
log.Printf("Delegation %s: step=inserting_success_log", delegationID)
// Store success (response_body must be JSONB, include delegation_id)
respJSON, _ := json.Marshal(map[string]interface{}{
"text": responseText,
@@ -465,7 +425,6 @@ handleSuccess:
`, sourceID, sourceID, targetID, "Delegation completed ("+textutil.TruncateBytes(responseText, 80)+")", string(respJSON)); err != nil {
log.Printf("Delegation %s: failed to insert success log: %v", delegationID, err)
}
log.Printf("Delegation %s: step=recording_ledger_completed", delegationID)
// RFC #2829 #318: write the ledger row with result_preview FIRST,
// THEN updateDelegationStatus. Order matters: SetStatus has a
@@ -475,9 +434,7 @@ handleSuccess:
// Caught by the local-Postgres integration test in
// delegation_ledger_integration_test.go.
recordLedgerStatus(ctx, delegationID, "completed", "", responseText)
log.Printf("Delegation %s: step=updating_completed_status", delegationID)
h.updateDelegationStatus(ctx, sourceID, delegationID, "completed", "")
log.Printf("Delegation %s: step=broadcasting_complete", delegationID)
h.updateDelegationStatus(sourceID, delegationID, "completed", "")
h.broadcaster.RecordAndBroadcast(ctx, string(events.EventDelegationComplete), sourceID, map[string]interface{}{
"delegation_id": delegationID,
"target_id": targetID,
@@ -485,12 +442,11 @@ handleSuccess:
})
// RFC #2829 PR-2 result-push (see UpdateStatus for rationale).
pushDelegationResultToInbox(ctx, sourceID, delegationID, "completed", responseText, "")
log.Printf("Delegation %s: step=complete", delegationID)
}
// updateDelegationStatus updates the status of a delegation record in activity_logs.
// ctx is used for DB operations; caller controls the timeout/retry budget.
func (h *DelegationHandler) updateDelegationStatus(ctx context.Context, workspaceID, delegationID, status, errorDetail string) {
func (h *DelegationHandler) updateDelegationStatus(workspaceID, delegationID, status, errorDetail string) {
ctx := context.Background()
if _, err := db.DB.ExecContext(ctx, `
UPDATE activity_logs
SET status = $1, error_detail = CASE WHEN $2 = '' THEN error_detail ELSE $2 END
@@ -604,7 +560,7 @@ func (h *DelegationHandler) UpdateStatus(c *gin.Context) {
recordLedgerStatus(ctx, delegationID, "completed", "", body.ResponsePreview)
}
h.updateDelegationStatus(ctx, sourceID, delegationID, body.Status, body.Error)
h.updateDelegationStatus(sourceID, delegationID, body.Status, body.Error)
if body.Status == "completed" {
respJSON, _ := json.Marshal(map[string]interface{}{
@@ -816,3 +772,4 @@ func extractResponseText(body []byte) string {
}
return string(body)
}
@@ -1,535 +0,0 @@
//go:build integration
// +build integration
// delegation_executor_integration_test.go — REAL Postgres integration tests for
// executeDelegation HTTP proxy edge cases that sqlmock cannot cover.
//
// The sqlmock tests in delegation_test.go pin which SQL statements fire but
// cannot detect bugs that depend on the row state AFTER the SQL runs. The
// result_preview-lost bug shipped to staging in PR #2854 because sqlmock tests
// were satisfied with "an UPDATE fired" — none verified the row's preview
// field actually landed. These integration tests close that gap.
//
// How HTTP is mocked
// -----------------
// We use raw TCP listeners (net.Listener) instead of httptest.Server to avoid
// any HTTP-library-level goroutine complexity. The test opens a TCP port,
// serves one HTTP response, then closes the connection. The a2aClient transport
// is overridden with a DialContext that intercepts all dials and redirects to
// the test server's port. No DNS, no TCP handshake overhead, no HTTP library
// goroutines that could block on request-body reads.
//
// Run with:
//
// docker run --rm -d --name pg-integration \
// -e POSTGRES_PASSWORD=test -e POSTGRES_DB=molecule \
// -p 55432:5432 postgres:15-alpine
// sleep 4
// psql ... < workspace-server/migrations/049_delegations.up.sql
// cd workspace-server
// INTEGRATION_DB_URL="postgres://postgres:test@localhost:55432/molecule?sslmode=disable" \
// go test -tags=integration ./internal/handlers/ -run Integration_ExecuteDelegation
//
// CI (.gitea/workflows/handlers-postgres-integration.yml) runs this on
// every PR that touches workspace-server/internal/handlers/**.
package handlers
import (
"context"
"database/sql"
"encoding/json"
"net"
"net/http"
"runtime"
"strconv"
"testing"
"time"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/db"
)
// integrationDB is imported from delegation_ledger_integration_test.go.
// Each test gets a fresh table state.
const testDelegationID = "del-159-test-integration"
const testSourceID = "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa"
const testTargetID = "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb"
// rawHTTPServer starts a TCP listener, serves one HTTP response, and closes.
// It runs in a background goroutine so the test can proceed immediately after
// returning the server URL. The server URL (e.g. "http://127.0.0.1:<port>/")
// is suitable for caching in Redis and passing to executeDelegation.
//
// The server reads HTTP headers using a deadline, then immediately sends the
// response. This prevents the classic TCP deadlock: server blocked reading
// body while client blocked waiting for response.
func rawHTTPServer(t *testing.T, statusCode int, body string) (serverURL string, closeFn func()) {
t.Helper()
// Use ListenTCP with explicit IPv4 to avoid IPv6 mismatch on macOS
// (Listen("tcp", "127.0.0.1:0") might bind ::1 on some systems).
ln, err := net.ListenTCP("tcp4", &net.TCPAddr{IP: net.ParseIP("127.0.0.1"), Port: 0})
if err != nil {
t.Fatalf("rawHTTPServer listen: %v", err)
}
port := ln.Addr().(*net.TCPAddr).Port
serverURL = "http://127.0.0.1:" + strconv.Itoa(port) + "/"
connCh := make(chan net.Conn, 1)
go func() {
conn, err := ln.Accept()
if err != nil {
return
}
connCh <- conn
}()
closeFn = func() {
ln.Close()
}
// Handle in background so we don't block test execution.
// Strategy: read available bytes with a deadline (enough for headers).
// After deadline fires, send the response immediately.
// The kernel discards any unread buffered body bytes when the
// connection closes — harmless.
go func() {
conn := <-connCh
if conn == nil {
return
}
// Read what we can with a 2s deadline. Headers always arrive first.
conn.SetReadDeadline(time.Now().Add(2 * time.Second))
headerBuf := make([]byte, 4096)
for {
n, err := conn.Read(headerBuf)
if n > 0 {
_ = headerBuf[:n]
}
if err != nil {
break
}
}
// Send response and IMMEDIATELY close the connection.
// If we keep it open, the client's request-body writer goroutine
// might block on the socket (waiting for the server to drain the
// body). Closing immediately unblocks it. The client already
// received the response, so the write error is harmless.
resp := buildHTTPResponse(statusCode, body)
conn.Write(resp) //nolint:errcheck
conn.Close()
}()
return serverURL, closeFn
}
// buildHTTPResponse constructs a minimal HTTP/1.1 response.
func buildHTTPResponse(statusCode int, body string) []byte {
statusText := http.StatusText(statusCode)
if statusText == "" {
statusText = "Unknown"
}
header := "HTTP/1.1 " + strconv.Itoa(statusCode) + " " + statusText + "\r\n" +
"Content-Type: application/json\r\n" +
"Content-Length: " + strconv.Itoa(len(body)) + "\r\n" +
"Connection: close\r\n" +
"\r\n"
return []byte(header + body)
}
// setupIntegrationFixtures inserts the rows executeDelegation requires:
// - workspaces: source and target (siblings, parent_id=NULL so CanCommunicate=true)
// - activity_logs: the 'delegate' row that updateDelegationStatus UPDATE will find
// - delegations: the ledger row that recordLedgerStatus will UPDATE
//
// Returns a cleanup function the test should defer.
func setupIntegrationFixtures(t *testing.T, conn *sql.DB) func() {
t.Helper()
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
for _, ws := range []struct {
id string
name string
parentID *string
}{
{testSourceID, "test-source", nil},
{testTargetID, "test-target", nil},
} {
if _, err := conn.ExecContext(ctx,
`INSERT INTO workspaces (id, name, parent_id) VALUES ($1::uuid, $2, $3) ON CONFLICT (id) DO NOTHING`,
ws.id, ws.name, ws.parentID,
); err != nil {
cancel()
t.Fatalf("seed workspace %s: %v", ws.id, err)
}
}
reqBody, _ := json.Marshal(map[string]any{
"delegation_id": testDelegationID,
"task": "do work",
})
if _, err := conn.ExecContext(ctx, `
INSERT INTO activity_logs
(workspace_id, activity_type, method, source_id, target_id, request_body, status)
VALUES ($1, 'delegate', 'delegate', $1, $2, $3::jsonb, 'pending')
ON CONFLICT DO NOTHING
`, testSourceID, testTargetID, string(reqBody)); err != nil {
cancel()
t.Fatalf("seed activity_logs: %v", err)
}
if _, err := conn.ExecContext(ctx, `
INSERT INTO delegations
(delegation_id, caller_id, callee_id, task_preview, status)
VALUES ($1, $2::uuid, $3::uuid, 'do work', 'queued')
ON CONFLICT (delegation_id) DO NOTHING
`, testDelegationID, testSourceID, testTargetID); err != nil {
cancel()
t.Fatalf("seed delegations: %v", err)
}
cancel()
return func() {
ctx2, cancel2 := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel2()
conn.ExecContext(ctx2,
`DELETE FROM activity_logs WHERE workspace_id = $1 AND request_body->>'delegation_id' = $2`,
testSourceID, testDelegationID)
conn.ExecContext(ctx2,
`DELETE FROM delegations WHERE delegation_id = $1`, testDelegationID)
conn.ExecContext(ctx2,
`DELETE FROM workspaces WHERE id IN ($1, $2)`, testSourceID, testTargetID)
}
}
// readDelegationRow returns (status, result_preview, error_detail) for the test
// delegation, or fails the test if the row is not found.
func readDelegationRow(t *testing.T, conn *sql.DB) (status, preview, errorDetail string) {
t.Helper()
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
var prev, errDet sql.NullString
err := conn.QueryRowContext(ctx,
`SELECT status, result_preview, error_detail FROM delegations WHERE delegation_id = $1`,
testDelegationID,
).Scan(&status, &prev, &errDet)
if err != nil {
t.Fatalf("readDelegationRow: %v", err)
}
return status, prev.String, errDet.String
}
// stack returns the current goroutine stack trace. Used by runWithTimeout to
// pinpoint the blocking call site when a test times out.
func stack() string {
buf := make([]byte, 4096)
n := runtime.Stack(buf, false)
return string(buf[:n])
}
// runWithTimeout calls fn in a goroutine and fails t if it doesn't return within
// timeout. ctx is passed to fn so it can propagate cancellation to
// executeDelegation's DB and network operations — without this, the goroutine
// leaks indefinitely when the test times out (context.Background() never cancels).
func runWithTimeout(t *testing.T, timeout time.Duration, fn func(context.Context)) {
t.Helper()
ctx, cancel := context.WithTimeout(context.Background(), timeout)
defer cancel()
done := make(chan struct{})
var panicErr interface{}
go func() {
defer func() {
if p := recover(); p != nil {
panicErr = p
}
close(done)
}()
fn(ctx)
}()
select {
case <-done:
if panicErr != nil {
t.Fatalf("executeDelegation panicked: %v\n%s", panicErr, stack())
}
case <-ctx.Done():
cancel()
t.Fatalf("executeDelegation timed out after %s\n%s", timeout, stack())
}
}
// TestIntegration_ExecuteDelegation_DeliveryConfirmedProxyError_TreatsAsSuccess
// is the integration regression gate for issue #159.
//
// Scenario: proxyA2ARequest returns a 200 status code with a non-empty body.
// isDeliveryConfirmedSuccess guard (status>=200 && <300 && len(body)>0 && err!=nil)
// routes to handleSuccess. The integration test verifies the DB row lands at
// 'completed' with the response body as result_preview.
func TestIntegration_ExecuteDelegation_DeliveryConfirmedProxyError_TreatsAsSuccess(t *testing.T) {
allowLoopbackForTest(t)
conn := integrationDB(t)
cleanup := setupIntegrationFixtures(t, conn)
defer cleanup()
t.Setenv("DELEGATION_LEDGER_WRITE", "1")
agentURL, closeServer := rawHTTPServer(t, 200, `{"result":{"parts":[{"text":"work completed successfully"}]}}`)
defer closeServer()
mr := setupTestRedis(t)
defer mr.Close()
db.CacheURL(context.Background(), testTargetID, agentURL)
prevClient := a2aClient
defer func() { a2aClient = prevClient }()
a2aClient = newA2AClientForHost(extractHostPort(agentURL))
broadcaster := newTestBroadcaster()
wh := NewWorkspaceHandler(broadcaster, nil, "http://localhost:8080", t.TempDir())
dh := NewDelegationHandler(wh, broadcaster)
a2aBody, _ := json.Marshal(map[string]interface{}{
"jsonrpc": "2.0",
"id": "1",
"method": "message/send",
"params": map[string]interface{}{
"message": map[string]interface{}{
"role": "user",
"parts": []map[string]string{{"type": "text", "text": "do work"}},
},
},
})
start := time.Now()
runWithTimeout(t, 30*time.Second, func(ctx context.Context) {
dh.executeDelegation(ctx, testSourceID, testTargetID, testDelegationID, a2aBody)
})
t.Logf("executeDelegation took %v", time.Since(start))
status, preview, errDet := readDelegationRow(t, conn)
if status != "completed" {
t.Errorf("status: want completed, got %q", status)
}
if preview == "" {
t.Errorf("result_preview should be non-empty, got %q", preview)
}
if errDet != "" {
t.Errorf("error_detail should be empty on success: got %q", errDet)
}
}
// TestIntegration_ExecuteDelegation_ProxyErrorNon2xx_RemainsFailed verifies that
// a 500 response routes to failure, not success. isDeliveryConfirmedSuccess
// requires status>=200 && <300, so 500 always fails the guard.
func TestIntegration_ExecuteDelegation_ProxyErrorNon2xx_RemainsFailed(t *testing.T) {
allowLoopbackForTest(t)
conn := integrationDB(t)
cleanup := setupIntegrationFixtures(t, conn)
defer cleanup()
t.Setenv("DELEGATION_LEDGER_WRITE", "1")
agentURL, closeServer := rawHTTPServer(t, 500, `{"error":"agent crashed"}`)
defer closeServer()
mr := setupTestRedis(t)
defer mr.Close()
db.CacheURL(context.Background(), testTargetID, agentURL)
prevClient := a2aClient
defer func() { a2aClient = prevClient }()
a2aClient = newA2AClientForHost(extractHostPort(agentURL))
broadcaster := newTestBroadcaster()
wh := NewWorkspaceHandler(broadcaster, nil, "http://localhost:8080", t.TempDir())
dh := NewDelegationHandler(wh, broadcaster)
a2aBody, _ := json.Marshal(map[string]interface{}{
"jsonrpc": "2.0", "id": "1", "method": "message/send",
"params": map[string]interface{}{
"message": map[string]interface{}{
"role": "user",
"parts": []map[string]string{{"type": "text", "text": "do work"}},
},
},
})
start := time.Now()
runWithTimeout(t, 30*time.Second, func(ctx context.Context) {
dh.executeDelegation(ctx, testSourceID, testTargetID, testDelegationID, a2aBody)
})
t.Logf("executeDelegation took %v", time.Since(start))
status, _, errDet := readDelegationRow(t, conn)
if status != "failed" {
t.Errorf("status: want failed, got %q", status)
}
if errDet == "" {
t.Error("error_detail should be non-empty on failure")
}
}
// TestIntegration_ExecuteDelegation_ProxyErrorEmptyBody_RemainsFailed verifies that
// a 200 response with an empty body routes to failure. isDeliveryConfirmedSuccess
// requires len(body) > 0, so an empty body fails the guard.
func TestIntegration_ExecuteDelegation_ProxyErrorEmptyBody_RemainsFailed(t *testing.T) {
allowLoopbackForTest(t)
conn := integrationDB(t)
cleanup := setupIntegrationFixtures(t, conn)
defer cleanup()
t.Setenv("DELEGATION_LEDGER_WRITE", "1")
agentURL, closeServer := rawHTTPServer(t, 200, "")
defer closeServer()
mr := setupTestRedis(t)
defer mr.Close()
db.CacheURL(context.Background(), testTargetID, agentURL)
prevClient := a2aClient
defer func() { a2aClient = prevClient }()
a2aClient = newA2AClientForHost(extractHostPort(agentURL))
broadcaster := newTestBroadcaster()
wh := NewWorkspaceHandler(broadcaster, nil, "http://localhost:8080", t.TempDir())
dh := NewDelegationHandler(wh, broadcaster)
a2aBody, _ := json.Marshal(map[string]interface{}{
"jsonrpc": "2.0", "id": "1", "method": "message/send",
"params": map[string]interface{}{
"message": map[string]interface{}{
"role": "user",
"parts": []map[string]string{{"type": "text", "text": "do work"}},
},
},
})
start := time.Now()
runWithTimeout(t, 30*time.Second, func(ctx context.Context) {
dh.executeDelegation(ctx, testSourceID, testTargetID, testDelegationID, a2aBody)
})
t.Logf("executeDelegation took %v", time.Since(start))
status, _, errDet := readDelegationRow(t, conn)
if status != "failed" {
t.Errorf("status: want failed, got %q", status)
}
if errDet == "" {
t.Error("error_detail should be non-empty on failure")
}
}
// TestIntegration_ExecuteDelegation_CleanProxyResponse_Unchanged is the baseline:
// a clean 200 response with a valid body and no error routes to success.
func TestIntegration_ExecuteDelegation_CleanProxyResponse_Unchanged(t *testing.T) {
allowLoopbackForTest(t)
conn := integrationDB(t)
cleanup := setupIntegrationFixtures(t, conn)
defer cleanup()
t.Setenv("DELEGATION_LEDGER_WRITE", "1")
agentURL, closeServer := rawHTTPServer(t, 200, `{"result":{"parts":[{"text":"all good"}]}}`)
defer closeServer()
mr := setupTestRedis(t)
defer mr.Close()
db.CacheURL(context.Background(), testTargetID, agentURL)
prevClient := a2aClient
defer func() { a2aClient = prevClient }()
a2aClient = newA2AClientForHost(extractHostPort(agentURL))
broadcaster := newTestBroadcaster()
wh := NewWorkspaceHandler(broadcaster, nil, "http://localhost:8080", t.TempDir())
dh := NewDelegationHandler(wh, broadcaster)
a2aBody, _ := json.Marshal(map[string]interface{}{
"jsonrpc": "2.0", "id": "1", "method": "message/send",
"params": map[string]interface{}{
"message": map[string]interface{}{
"role": "user",
"parts": []map[string]string{{"type": "text", "text": "do work"}},
},
},
})
start := time.Now()
runWithTimeout(t, 30*time.Second, func(ctx context.Context) {
dh.executeDelegation(ctx, testSourceID, testTargetID, testDelegationID, a2aBody)
})
t.Logf("executeDelegation took %v", time.Since(start))
status, preview, errDet := readDelegationRow(t, conn)
if status != "completed" {
t.Errorf("status: want completed, got %q", status)
}
if preview == "" {
t.Errorf("result_preview should be non-empty, got %q", preview)
}
if errDet != "" {
t.Errorf("error_detail should be empty on success: got %q", errDet)
}
}
// Test that a delegation where Redis cannot be reached still routes to failure
// (not panic). proxyA2ARequest falls back to DB URL lookup when Redis is down.
func TestIntegration_ExecuteDelegation_RedisDown_FallsBackToDB(t *testing.T) {
allowLoopbackForTest(t)
conn := integrationDB(t)
cleanup := setupIntegrationFixtures(t, conn)
defer cleanup()
t.Setenv("DELEGATION_LEDGER_WRITE", "1")
// Set up miniredis so db.RDB is non-nil, but do NOT cache any URL.
// resolveAgentURL skips Redis and falls back to DB, which also has no URL.
mr := setupTestRedis(t)
defer mr.Close()
broadcaster := newTestBroadcaster()
wh := NewWorkspaceHandler(broadcaster, nil, "http://localhost:8080", t.TempDir())
dh := NewDelegationHandler(wh, broadcaster)
a2aBody, _ := json.Marshal(map[string]interface{}{
"jsonrpc": "2.0", "id": "1", "method": "message/send",
"params": map[string]interface{}{
"message": map[string]interface{}{
"role": "user",
"parts": []map[string]string{{"type": "text", "text": "do work"}},
},
},
})
start := time.Now()
runWithTimeout(t, 30*time.Second, func(ctx context.Context) {
dh.executeDelegation(ctx, testSourceID, testTargetID, testDelegationID, a2aBody)
})
t.Logf("executeDelegation took %v", time.Since(start))
status, _, errDet := readDelegationRow(t, conn)
if status != "failed" {
t.Errorf("status: want failed (no target URL), got %q", status)
}
if errDet == "" {
t.Error("error_detail should be set on failure due to unreachable target")
}
}
// extractHostPort parses "http://127.0.0.1:PORT/" and returns "127.0.0.1:PORT".
func extractHostPort(rawURL string) string {
// Simple parse: strip "http://" prefix and trailing slash.
// The URL format is always "http://127.0.0.1:PORT/" in our usage.
if len(rawURL) > 7 {
return rawURL[7 : len(rawURL)-1]
}
return rawURL
}
// newA2AClientForHost creates an http.Client that redirects all connections
// to the given host:port. This lets us mock the agent endpoint without
// running a real HTTP server.
func newA2AClientForHost(targetHost string) *http.Client {
return &http.Client{
Transport: &http.Transport{
DialContext: func(ctx context.Context, network, addr string) (net.Conn, error) {
return net.Dial("tcp", targetHost)
},
ResponseHeaderTimeout: 180 * time.Second,
},
}
}
@@ -154,28 +154,10 @@ func (l *DelegationLedger) SetStatus(ctx context.Context,
return err
}
// Same-status replay (e.g. duplicate completion notification): usually a
// no-op. If the replay carries terminal detail that the first write lacked,
// fill the missing nullable column once. This keeps duplicate notifications
// idempotent while preserving the first observed result/error when a legacy
// path wrote the terminal status before it had the detail payload.
// Same-status replay (e.g. duplicate completion notification): no-op,
// don't bump updated_at, no error.
if current == status {
if errorDetail == "" && resultPreview == "" {
return nil
}
_, err = l.db.ExecContext(ctx, `
UPDATE delegations
SET error_detail = COALESCE(error_detail, NULLIF($2, '')),
result_preview = COALESCE(result_preview, NULLIF($3, '')),
updated_at = CASE
WHEN (error_detail IS NULL AND NULLIF($2, '') IS NOT NULL)
OR (result_preview IS NULL AND NULLIF($3, '') IS NOT NULL)
THEN now()
ELSE updated_at
END
WHERE delegation_id = $1
`, delegationID, errorDetail, textutil.TruncateBytesNoMarker(resultPreview, previewCap))
return err
return nil
}
// Forward-only on terminal states.
@@ -39,7 +39,6 @@ import (
"os"
"strings"
"testing"
"time"
mdb "github.com/Molecule-AI/molecule-monorepo/platform/internal/db"
_ "github.com/lib/pq"
@@ -65,16 +64,12 @@ func integrationDB(t *testing.T) *sql.DB {
if err != nil {
t.Fatalf("open: %v", err)
}
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
if err := conn.PingContext(ctx); err != nil {
if err := conn.Ping(); err != nil {
t.Fatalf("ping: %v", err)
}
// Each test gets a fresh table state — fail loud if cleanup fails so
// a bad test doesn't pollute the next one.
ctx2, cancel2 := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel2()
if _, err := conn.ExecContext(ctx2, `DELETE FROM delegations`); err != nil {
if _, err := conn.ExecContext(context.Background(), `DELETE FROM delegations`); err != nil {
t.Fatalf("cleanup: %v", err)
}
// Wire the package-level db.DB so production helpers (recordLedgerInsert,
@@ -150,11 +145,16 @@ func TestIntegration_ResultPreviewPreservedThroughCompletion(t *testing.T) {
}
}
// Same-status terminal replays remain idempotent, but if the first terminal
// write lacked result_preview, a later same-status replay carrying the preview
// should fill that missing field once. This protects legacy call ordering and
// mirrors the failure-path error_detail repair.
func TestIntegration_ResultPreviewSameStatusReplayFillsMissingPreview(t *testing.T) {
// TestIntegration_ResultPreviewBuggyOrderIsLost — DIAGNOSTIC test that
// confirms the ORIGINAL buggy order does lose the preview. Useful when
// auditing similar wiring elsewhere.
//
// This is documented behavior: it asserts the same-status replay no-op
// works as designed in DelegationLedger.SetStatus. The fix in
// delegation.go is to AVOID this order, not to change SetStatus's
// same-status semantics (which the operator dashboard relies on for
// idempotent completion notifications).
func TestIntegration_ResultPreviewBuggyOrderIsLost(t *testing.T) {
conn := integrationDB(t)
t.Setenv("DELEGATION_LEDGER_WRITE", "1")
@@ -162,17 +162,16 @@ func TestIntegration_ResultPreviewSameStatusReplayFillsMissingPreview(t *testing
caller := "11111111-1111-1111-1111-111111111111"
callee := "22222222-2222-2222-2222-222222222222"
// Legacy sequence: queued → dispatched → completed (no preview)
// completed (preview). The second completed replay should repair the
// missing preview without changing status.
// BUGGY sequence in production-shape order: queued → dispatched →
// completed (no preview) → completed (preview ignored as same-status).
recordLedgerInsert(context.Background(), caller, callee, id, "the question", "")
recordLedgerStatus(context.Background(), id, "dispatched", "", "")
recordLedgerStatus(context.Background(), id, "completed", "", "")
recordLedgerStatus(context.Background(), id, "completed", "", "the answer")
recordLedgerStatus(context.Background(), id, "dispatched", "", "") // pre-completion stage
recordLedgerStatus(context.Background(), id, "completed", "", "") // inner first
recordLedgerStatus(context.Background(), id, "completed", "", "the answer") // outer same-status no-op
_, preview, _ := readLedgerRow(t, conn, id)
if preview != "the answer" {
t.Errorf("same-status replay should fill missing preview; got %q", preview)
if preview != "" {
t.Errorf("buggy-order preview was unexpectedly non-empty: %q (SetStatus same-status no-op contract may have changed)", preview)
}
}
@@ -226,25 +226,6 @@ func TestLedgerSetStatus_SameStatusReplay_NoUpdate(t *testing.T) {
}
}
func TestLedgerSetStatus_SameStatusReplay_FillsMissingDetail(t *testing.T) {
mock := setupTestDB(t)
l := NewDelegationLedger(nil)
mock.ExpectQuery(`SELECT status FROM delegations WHERE delegation_id = \$1`).
WithArgs("d-1").
WillReturnRows(sqlmock.NewRows([]string{"status"}).AddRow("failed"))
mock.ExpectExec(`UPDATE delegations\s+SET error_detail = COALESCE\(error_detail, NULLIF\(\$2, ''\)\),\s+result_preview = COALESCE\(result_preview, NULLIF\(\$3, ''\)\),\s+updated_at = CASE`).
WithArgs("d-1", "agent returned empty response", "").
WillReturnResult(sqlmock.NewResult(0, 1))
if err := l.SetStatus(context.Background(), "d-1", "failed", "agent returned empty response", ""); err != nil {
t.Errorf("same-status detail fill should succeed, got err: %v", err)
}
if err := mock.ExpectationsWereMet(); err != nil {
t.Errorf("unmet: %v", err)
}
}
func TestLedgerSetStatus_MissingRowIsNoOp(t *testing.T) {
// A SetStatus call that arrives before Insert (lost INSERT, race, etc.)
// must NOT error — it's a transient inconsistency the next agent retry
@@ -5,8 +5,10 @@ import (
"context"
"encoding/json"
"fmt"
"net"
"net/http"
"net/http/httptest"
"sync"
"testing"
"time"
@@ -956,3 +958,316 @@ func TestInsertDelegationOutcome_ZeroValueIsUnknown(t *testing.T) {
t.Errorf("insertOutcomeUnknown must not collide with insertOK")
}
}
// ==================== executeDelegation — delivery-confirmed proxy error regression tests ====================
//
// These test the fix for issue #159: when proxyA2ARequest returns an error but we have a
// non-empty response body with a 2xx status code, executeDelegation must treat it as success.
// The error is a delivery/transport error (e.g., connection reset after response was received).
// Previously, executeDelegation marked these as "failed" even though the work was done,
// causing retry storms and "error" rendering in canvas despite the response being available.
//
// Test strategy: spin up a mock A2A agent server, set up the source/target DB rows, call
// executeDelegation directly, and verify the activity_logs status and delegation status.
const testDelegationID = "del-159-test"
const testSourceID = "ws-source-159"
const testTargetID = "ws-target-159"
// expectExecuteDelegationBase sets up sqlmock expectations for the DB queries that
// executeDelegation always makes, regardless of outcome.
func expectExecuteDelegationBase(mock sqlmock.Sqlmock) {
// updateDelegationStatus: dispatched
// Uses prefix match — sqlmock regexes match the full query string.
mock.ExpectExec("UPDATE activity_logs SET status").
WithArgs("dispatched", "", testSourceID, testDelegationID).
WillReturnResult(sqlmock.NewResult(0, 1))
// CanCommunicate: source != target → fires two getWorkspaceRef lookups.
// Both test fixtures have parent_id = NULL (root-level siblings) → allowed.
// Order matches call order: source first, then target.
mock.ExpectQuery("SELECT id, parent_id FROM workspaces WHERE id").
WithArgs(testSourceID).
WillReturnRows(sqlmock.NewRows([]string{"id", "parent_id"}).AddRow(testSourceID, nil))
mock.ExpectQuery("SELECT id, parent_id FROM workspaces WHERE id").
WithArgs(testTargetID).
WillReturnRows(sqlmock.NewRows([]string{"id", "parent_id"}).AddRow(testTargetID, nil))
// resolveAgentURL: reads ws:{id}:url from Redis, falls back to DB for target
mock.ExpectQuery("SELECT url, status FROM workspaces WHERE id = ").
WithArgs(testTargetID).
WillReturnRows(sqlmock.NewRows([]string{"url", "status"}).AddRow("", "online"))
}
// expectExecuteDelegationSuccess sets up expectations for a completed delegation.
func expectExecuteDelegationSuccess(mock sqlmock.Sqlmock, respBody string) {
// INSERT activity_logs for delegation completion (response_body status = 'completed')
mock.ExpectExec("INSERT INTO activity_logs").
WithArgs(sqlmock.AnyArg(), sqlmock.AnyArg(), sqlmock.AnyArg(), sqlmock.AnyArg(), sqlmock.AnyArg(), sqlmock.AnyArg(), "completed").
WillReturnResult(sqlmock.NewResult(0, 1))
// updateDelegationStatus: completed
mock.ExpectExec("UPDATE activity_logs SET status").
WithArgs("completed", "", testSourceID, testDelegationID).
WillReturnResult(sqlmock.NewResult(0, 1))
}
// expectExecuteDelegationFailed sets up expectations for a failed delegation.
func expectExecuteDelegationFailed(mock sqlmock.Sqlmock) {
// INSERT activity_logs for delegation failure (response_body status = 'failed')
mock.ExpectExec("INSERT INTO activity_logs").
WithArgs(sqlmock.AnyArg(), sqlmock.AnyArg(), sqlmock.AnyArg(), sqlmock.AnyArg(), sqlmock.AnyArg(), sqlmock.AnyArg(), "failed").
WillReturnResult(sqlmock.NewResult(0, 1))
// updateDelegationStatus: failed
mock.ExpectExec("UPDATE activity_logs SET status").
WithArgs("failed", sqlmock.AnyArg(), testSourceID, testDelegationID).
WillReturnResult(sqlmock.NewResult(0, 1))
}
// TestExecuteDelegation_DeliveryConfirmedProxyError_TreatsAsSuccess is the primary regression
// test for issue #159. The scenario:
// - Attempt 1: server sends 200 OK headers + partial body, then closes connection.
// proxyA2ARequest: body read gets io.EOF (partial body read), returns (200, <partial>, BadGateway).
// isTransientProxyError(BadGateway) = TRUE → retry.
// - Attempt 2: server does the same thing (closes after partial body).
// proxyA2ARequest: same (200, <partial>, BadGateway).
// isTransientProxyError(BadGateway) = TRUE → retry AGAIN (but outer context will fire soon,
// or we get one more attempt). For the test we let it run.
// POST-FIX: the executeDelegation new condition sees status=200, body=<partial>, err!=nil
// and routes to handleSuccess immediately.
//
// The key pre/post-fix difference: pre-fix, executeDelegation received status=0 (hardcoded)
// even when the server sent 200, so the condition always failed. Post-fix, status=200 is
// preserved through the error return path (proxyA2ARequest now returns resp.StatusCode, respBody).
// In this test the retry ultimately succeeds (server eventually sends full body), but
// the critical assertion is that a 2xx partial-body delivery-confirmed response is never
// classified as "failed" — it always routes to success.
func TestExecuteDelegation_DeliveryConfirmedProxyError_TreatsAsSuccess(t *testing.T) {
mock := setupTestDB(t)
mr := setupTestRedis(t)
allowLoopbackForTest(t)
broadcaster := newTestBroadcaster()
wh := NewWorkspaceHandler(broadcaster, nil, "http://localhost:8080", t.TempDir())
dh := NewDelegationHandler(wh, broadcaster)
// Server that sends a 200 response with declared Content-Length but closes
// the connection before sending all bytes. Go's http.Client sees io.EOF on
// the body read. proxyA2ARequest captures the partial body + status=200 and
// returns (200, <partial>, error). executeDelegation's new condition sees
// status=200 + body > 0 + error != nil → routes to handleSuccess.
var wg sync.WaitGroup
wg.Add(1)
ln, err := net.Listen("tcp", "127.0.0.1:0")
if err != nil {
t.Fatalf("failed to listen: %v", err)
}
defer ln.Close()
go func() {
defer wg.Done()
conn, err := ln.Accept()
if err != nil {
return
}
defer conn.Close()
// Consume the HTTP request
buf := make([]byte, 2048)
conn.Read(buf)
// Send 200 OK with Content-Length: 100 but only 74 bytes of body
// (less than declared length → io.LimitReader returns io.EOF after reading all 74)
resp := "HTTP/1.1 200 OK\r\nContent-Type: application/json\r\nContent-Length: 100\r\n\r\n"
resp += `{"result":{"parts":[{"text":"work completed successfully"}]}}` // 74 bytes
conn.Write([]byte(resp))
// Close immediately — client gets io.EOF on body read
}()
agentURL := "http://" + ln.Addr().String()
mr.Set(fmt.Sprintf("ws:%s:url", testTargetID), agentURL)
allowLoopbackForTest(t)
expectExecuteDelegationBase(mock)
expectExecuteDelegationSuccess(mock, `{"result":{"parts":[{"text":"work completed successfully"}]}}`)
// Execute synchronously (not as a goroutine) so we can check DB state immediately.
// The handler fires it as goroutine; we call it directly for deterministic testing.
a2aBody, _ := json.Marshal(map[string]interface{}{
"jsonrpc": "2.0",
"id": "1",
"method": "message/send",
"params": map[string]interface{}{
"message": map[string]interface{}{
"role": "user",
"parts": []map[string]string{{"type": "text", "text": "do work"}},
},
},
})
dh.executeDelegation(testSourceID, testTargetID, testDelegationID, a2aBody)
time.Sleep(100 * time.Millisecond) // let DB writes settle
if err := mock.ExpectationsWereMet(); err != nil {
t.Errorf("unmet sqlmock expectations: %v", err)
}
}
// TestExecuteDelegation_ProxyErrorNon2xx_RemainsFailed verifies that the pre-fix failure
// path is unchanged when proxyA2ARequest returns a delivery-confirmed error with a non-2xx
// status code (e.g., 500 Internal Server Error with partial body read before connection drop).
// The new condition requires status >= 200 && status < 300, so non-2xx always routes to failure.
func TestExecuteDelegation_ProxyErrorNon2xx_RemainsFailed(t *testing.T) {
mock := setupTestDB(t)
mr := setupTestRedis(t)
allowLoopbackForTest(t)
broadcaster := newTestBroadcaster()
wh := NewWorkspaceHandler(broadcaster, nil, "http://localhost:8080", t.TempDir())
dh := NewDelegationHandler(wh, broadcaster)
// Server returns 500 with declared Content-Length but closes connection early.
// proxyA2ARequest: reads 500 headers, partial body, then connection drop → body read error.
// Returns (500, <partial_body>, BadGateway).
// New condition: status=500 is NOT >= 200 && < 300 → routes to failure.
// isTransientProxyError(500) = false → no retry.
var wg sync.WaitGroup
wg.Add(1)
ln, err := net.Listen("tcp", "127.0.0.1:0")
if err != nil {
t.Fatalf("failed to listen: %v", err)
}
defer ln.Close()
go func() {
defer wg.Done()
conn, err := ln.Accept()
if err != nil {
return
}
defer conn.Close()
buf := make([]byte, 2048)
conn.Read(buf)
// 500 with Content-Length: 100 but only ~60 bytes of body
resp := "HTTP/1.1 500 Internal Server Error\r\nContent-Type: application/json\r\nContent-Length: 100\r\n\r\n"
resp += `{"error":"agent crashed"}` // ~24 bytes, less than declared
conn.Write([]byte(resp))
// Close immediately — client gets io.EOF on body read
}()
agentURL := "http://" + ln.Addr().String()
mr.Set(fmt.Sprintf("ws:%s:url", testTargetID), agentURL)
allowLoopbackForTest(t)
expectExecuteDelegationBase(mock)
expectExecuteDelegationFailed(mock)
a2aBody, _ := json.Marshal(map[string]interface{}{
"jsonrpc": "2.0", "id": "1", "method": "message/send",
"params": map[string]interface{}{
"message": map[string]interface{}{
"role": "user",
"parts": []map[string]string{{"type": "text", "text": "do work"}},
},
},
})
dh.executeDelegation(testSourceID, testTargetID, testDelegationID, a2aBody)
time.Sleep(100 * time.Millisecond)
if err := mock.ExpectationsWereMet(); err != nil {
t.Errorf("unmet sqlmock expectations: %v", err)
}
}
// TestExecuteDelegation_ProxyErrorEmptyBody_RemainsFailed verifies that the pre-fix failure
// path is unchanged when proxyA2ARequest returns an error with a 2xx status but empty body.
// The new condition requires len(respBody) > 0, so empty body routes to failure.
func TestExecuteDelegation_ProxyErrorEmptyBody_RemainsFailed(t *testing.T) {
mock := setupTestDB(t)
mr := setupTestRedis(t)
allowLoopbackForTest(t)
broadcaster := newTestBroadcaster()
wh := NewWorkspaceHandler(broadcaster, nil, "http://localhost:8080", t.TempDir())
dh := NewDelegationHandler(wh, broadcaster)
// Server returns 502 Bad Gateway — proxyA2ARequest returns 502, body="" (empty), error != nil.
// New condition: proxyErr != nil && len(respBody) > 0 && status >= 200 && status < 300
// → len(respBody) == 0 → condition FALSE → falls through to failure.
// isTransientProxyError(502) is TRUE → retry → same result → failure.
agentServer := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusBadGateway)
// No body — connection closes normally
}))
defer agentServer.Close()
mr.Set(fmt.Sprintf("ws:%s:url", testTargetID), agentServer.URL)
allowLoopbackForTest(t)
// First attempt: updateDelegationStatus(dispatched) — from expectExecuteDelegationBase
expectExecuteDelegationBase(mock)
// Second attempt (retry): updateDelegationStatus(dispatched) again
mock.ExpectExec("UPDATE activity_logs SET status").
WithArgs("dispatched", "", testSourceID, testDelegationID).
WillReturnResult(sqlmock.NewResult(0, 1))
// Failure: INSERT + UPDATE (failed)
expectExecuteDelegationFailed(mock)
a2aBody, _ := json.Marshal(map[string]interface{}{
"jsonrpc": "2.0", "id": "1", "method": "message/send",
"params": map[string]interface{}{
"message": map[string]interface{}{
"role": "user",
"parts": []map[string]string{{"type": "text", "text": "do work"}},
},
},
})
dh.executeDelegation(testSourceID, testTargetID, testDelegationID, a2aBody)
time.Sleep(100 * time.Millisecond)
if err := mock.ExpectationsWereMet(); err != nil {
t.Errorf("unmet sqlmock expectations: %v", err)
}
}
// TestExecuteDelegation_CleanProxyResponse_Unchanged verifies that a clean proxy response
// (no error, 200 with body) is unaffected by the new condition. This is the baseline:
// proxyErr == nil so the new condition never fires.
func TestExecuteDelegation_CleanProxyResponse_Unchanged(t *testing.T) {
mock := setupTestDB(t)
mr := setupTestRedis(t)
allowLoopbackForTest(t)
broadcaster := newTestBroadcaster()
wh := NewWorkspaceHandler(broadcaster, nil, "http://localhost:8080", t.TempDir())
dh := NewDelegationHandler(wh, broadcaster)
agentServer := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusOK)
w.Header().Set("Content-Type", "application/json")
w.Write([]byte(`{"result":{"parts":[{"text":"all good"}]}}`))
}))
defer agentServer.Close()
mr.Set(fmt.Sprintf("ws:%s:url", testTargetID), agentServer.URL)
allowLoopbackForTest(t)
expectExecuteDelegationBase(mock)
expectExecuteDelegationSuccess(mock, `{"result":{"parts":[{"text":"all good"}]}}`)
a2aBody, _ := json.Marshal(map[string]interface{}{
"jsonrpc": "2.0", "id": "1", "method": "message/send",
"params": map[string]interface{}{
"message": map[string]interface{}{
"role": "user",
"parts": []map[string]string{{"type": "text", "text": "do work"}},
},
},
})
dh.executeDelegation(testSourceID, testTargetID, testDelegationID, a2aBody)
time.Sleep(100 * time.Millisecond)
if err := mock.ExpectationsWereMet(); err != nil {
t.Errorf("unmet sqlmock expectations: %v", err)
}
}
@@ -292,8 +292,8 @@ func filterPeersByQuery(peers []map[string]interface{}, q string) []map[string]i
needle := strings.ToLower(q)
out := make([]map[string]interface{}, 0, len(peers))
for _, p := range peers {
name, _ := p["name"].(string) // nil → "" — safe on empty-role rows
role, _ := p["role"].(string) // nil → "" — queryPeerMaps sets nil when DB role is empty
name := p["name"].(string)
role := p["role"].(string)
if strings.Contains(strings.ToLower(name), needle) ||
strings.Contains(strings.ToLower(role), needle) {
out = append(out, p)
@@ -394,80 +394,6 @@ func TestPeers_Q_NoMatches_RawBodyIsArrayNotNull(t *testing.T) {
}
}
// TestFilterPeersByQuery_NilRoleRegression is the regression gate for
// mc#730/#731: queryPeerMaps sets peer["role"] = nil when the DB role column
// is empty (discovery.go lines 337-341). filterPeersByQuery did a bare
// type assertion p["role"].(string) which panics on nil. The fix uses the
// comma-ok form so nil → "". The test passes a map with nil name and nil
// role and asserts no panic + correct filter behaviour.
func TestFilterPeersByQuery_NilRoleRegression(t *testing.T) {
cases := []struct {
name string
peers []map[string]interface{}
q string
wantLen int
wantIDs []string
}{
{
name: "nil role matches on name",
peers: []map[string]interface{}{
{"id": "ws-a", "name": nil, "role": nil},
{"id": "ws-b", "name": "Alpha Builder", "role": nil},
{"id": "ws-c", "name": "Beta Builder", "role": nil},
},
q: "alpha",
wantLen: 1,
wantIDs: []string{"ws-b"},
},
{
name: "nil name matches on nil role (empty string)",
peers: []map[string]interface{}{
{"id": "ws-x", "name": nil, "role": nil},
{"id": "ws-y", "name": "Dev Workspace", "role": nil},
},
q: "",
wantLen: 2, // empty q is a no-op
wantIDs: []string{"ws-x", "ws-y"},
},
{
name: "all nil — no panic, returns input",
peers: []map[string]interface{}{
{"id": "ws-z", "name": nil, "role": nil},
},
q: "anything",
wantLen: 0,
wantIDs: nil,
},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
got := filterPeersByQuery(tc.peers, tc.q)
if len(got) != tc.wantLen {
t.Fatalf("len: got %d, want %d", len(got), tc.wantLen)
}
gotIDs := make([]string, len(got))
for i, p := range got {
gotIDs[i] = p["id"].(string)
}
if tc.wantIDs != nil {
for _, id := range tc.wantIDs {
found := false
for _, g := range gotIDs {
if g == id {
found = true
break
}
}
if !found {
t.Errorf("missing id %q; got IDs: %v", id, gotIDs)
}
}
}
})
}
}
func keysOf(m map[string]struct{}) []string {
out := make([]string, 0, len(m))
for k := range m {
+1 -2
View File
@@ -434,8 +434,7 @@ func (h *MCPHandler) dispatchRPC(ctx context.Context, workspaceID string, req mc
}
default:
// Per OFFSEC-001: error message must not include user-controlled req.Method.
base.Error = &mcpRPCError{Code: -32601, Message: "method not found"}
base.Error = &mcpRPCError{Code: -32601, Message: "method not found: " + req.Method}
}
return base
+11 -117
View File
@@ -9,7 +9,6 @@ import (
"net/http"
"net/http/httptest"
"os"
"strings"
"testing"
"errors"
@@ -205,9 +204,6 @@ func TestMCPHandler_NotificationsInitialized_Returns200(t *testing.T) {
// Unknown method
// ─────────────────────────────────────────────────────────────────────────────
// TestMCPHandler_UnknownMethod_Returns32601 verifies dispatchRPC returns
// -32601 for an unknown method. Per OFFSEC-001: the error message must be
// constant — req.Method is user-controlled and must NOT appear in the response.
func TestMCPHandler_UnknownMethod_Returns32601(t *testing.T) {
h, _ := newMCPHandler(t)
@@ -228,14 +224,6 @@ func TestMCPHandler_UnknownMethod_Returns32601(t *testing.T) {
if resp.Error.Code != -32601 {
t.Errorf("expected code -32601, got %d", resp.Error.Code)
}
// Message must be constant — no user-controlled method name leak.
if resp.Error.Message != "method not found" {
t.Errorf("error message should be constant 'method not found', got: %q", resp.Error.Message)
}
// Double-check the method name never appears in the message (defence-in-depth).
if strings.Contains(resp.Error.Message, "not/a/real/method") {
t.Error("error message must not echo the user-controlled method name")
}
}
// ─────────────────────────────────────────────────────────────────────────────
@@ -417,32 +405,11 @@ func TestMCPHandler_CommitMemory_LocalScope_Success(t *testing.T) {
}
}
// TestMCPHandler_CommitMemory_GlobalScope_Blocked_ScrubsInternalError verifies
// two contracts at once on the GLOBAL-scope-blocked path:
//
// 1. C3 invariant (commit_memory with scope=GLOBAL aborts on the MCP bridge
// before touching the DB), AND
// 2. OFFSEC-001 / #259 scrub contract (commit 7d1a189f): the JSON-RPC error
// returned to the client is a CONSTANT — code=-32000, message="tool call
// failed" — with the production-internal err.Error() text logged
// server-side, never reflected back to the caller.
//
// Prior to this rename the test asserted that the client-visible message
// CONTAINED the substring "GLOBAL", which was the human-readable internal
// error from toolCommitMemory. mc#664 Class 2 flipped that assertion the
// right way around: now the test FAILS if the scrub regresses (i.e. if the
// internal string is ever reflected back to the wire), and PASSES iff the
// scrubbed constant reaches the client.
//
// Coupling note: the constant string "tool call failed" and the code -32000
// are the same values asserted by
// TestMCPHandler_dispatchRPC_UnknownTool_ReturnsConstantMessage — both are
// the OFFSEC-001 contract for the dispatch-failure branch in mcp.go (the
// third err.Error() leak that 7d1a189f scrubbed). If those constants ever
// change, both tests must move together.
func TestMCPHandler_CommitMemory_GlobalScope_Blocked_ScrubsInternalError(t *testing.T) {
// TestMCPHandler_CommitMemory_GlobalScope_Blocked verifies that C3 is enforced:
// GLOBAL scope is not permitted on the MCP bridge.
func TestMCPHandler_CommitMemory_GlobalScope_Blocked(t *testing.T) {
h, mock := newMCPHandler(t)
// No DB expectations — handler must abort before touching the DB (C3).
// No DB expectations — handler must abort before touching the DB.
w := mcpPost(t, h, "ws-1", map[string]interface{}{
"jsonrpc": "2.0",
@@ -457,53 +424,14 @@ func TestMCPHandler_CommitMemory_GlobalScope_Blocked_ScrubsInternalError(t *test
},
})
// JSON-RPC envelope returns 200 with the error in the body — only
// malformed-JSON-at-the-envelope-layer returns 400 (see Call() in mcp.go).
if w.Code != http.StatusOK {
t.Fatalf("expected 200 (JSON-RPC error in body), got %d: %s", w.Code, w.Body.String())
}
var resp mcpResponse
if err := json.Unmarshal(w.Body.Bytes(), &resp); err != nil {
t.Fatalf("response is not valid JSON: %v", err)
}
// (1) C3: an error must be reported.
json.Unmarshal(w.Body.Bytes(), &resp)
if resp.Error == nil {
t.Fatal("expected JSON-RPC error for GLOBAL scope, got nil")
t.Error("expected JSON-RPC error for GLOBAL scope, got nil")
}
// (2) OFFSEC-001 positive assertions — exact equality on the scrubbed
// constants so any change (re-leak of err.Error(), code mutation) trips
// the test. Substring-match would not catch a partial re-leak.
if resp.Error.Code != -32000 {
t.Errorf("error code should be -32000 (Server error / dispatch-failure), got: %d", resp.Error.Code)
if resp.Error != nil && !bytes.Contains([]byte(resp.Error.Message), []byte("GLOBAL")) {
t.Errorf("error message should mention GLOBAL, got: %s", resp.Error.Message)
}
if resp.Error.Message != "tool call failed" {
t.Errorf("error message should be the OFFSEC-001 constant %q, got: %q", "tool call failed", resp.Error.Message)
}
// (3) OFFSEC-001 negative assertions — the internal err.Error() text
// from toolCommitMemory ("GLOBAL scope is not permitted via the MCP
// bridge — use LOCAL or TEAM") must NOT appear in the client-visible
// message. Each token below is a distinct substring of that internal
// string; if ANY leaks through, the scrub in mcp.go dispatchRPC has
// regressed and this assertion fires the canary.
leakedTokens := []string{
"GLOBAL", // scope name
"scope", // policy lexicon
"permitted", // policy verb
"bridge", // internal architecture term
"LOCAL", // alternative scope name
"TEAM", // alternative scope name
}
for _, tok := range leakedTokens {
if bytes.Contains([]byte(resp.Error.Message), []byte(tok)) {
t.Errorf("OFFSEC-001 scrub regression: client-visible error.message leaks internal token %q (got: %q)", tok, resp.Error.Message)
}
}
// (4) C3 invariant preserved: handler must short-circuit before any DB call.
if err := mock.ExpectationsWereMet(); err != nil {
t.Errorf("unexpected DB calls on GLOBAL scope block: %v", err)
}
@@ -608,13 +536,7 @@ func TestMCPHandler_CommitMemory_CleanContent_PassesThrough(t *testing.T) {
// tools/call — recall_memory
// ─────────────────────────────────────────────────────────────────────────────
// TestMCPHandler_RecallMemory_GlobalScope_Blocked_ScrubsInternalError verifies
// C3 (GLOBAL scope blocked on MCP bridge) is enforced and that the OFFSEC-001
// scrub contract applies: the client-visible error.message is the constant
// "tool call failed", NOT the descriptive internal reason. The internal reason
// ("GLOBAL scope is not permitted via the MCP bridge") is logged server-side
// but must never reach the wire.
func TestMCPHandler_RecallMemory_GlobalScope_Blocked_ScrubsInternalError(t *testing.T) {
func TestMCPHandler_RecallMemory_GlobalScope_Blocked(t *testing.T) {
h, mock := newMCPHandler(t)
// No DB expectations — handler must abort before touching the DB.
@@ -632,38 +554,10 @@ func TestMCPHandler_RecallMemory_GlobalScope_Blocked_ScrubsInternalError(t *test
})
var resp mcpResponse
if err := json.Unmarshal(w.Body.Bytes(), &resp); err != nil {
t.Fatalf("response is not valid JSON: %v", err)
}
// (1) C3: an error must be reported.
json.Unmarshal(w.Body.Bytes(), &resp)
if resp.Error == nil {
t.Fatal("expected JSON-RPC error for GLOBAL scope recall, got nil")
t.Error("expected JSON-RPC error for GLOBAL scope recall, got nil")
}
// (2) OFFSEC-001 positive assertions — exact equality on the scrubbed
// constants so any change (re-leak of err.Error(), code mutation) trips
// the test.
if resp.Error.Code != -32000 {
t.Errorf("error code should be -32000 (Server error / dispatch-failure), got: %d", resp.Error.Code)
}
if resp.Error.Message != "tool call failed" {
t.Errorf("error message should be the OFFSEC-001 constant %q, got: %q", "tool call failed", resp.Error.Message)
}
// (3) OFFSEC-001 negative assertions — the internal reason must NOT appear
// in the client-visible message.
leakedTokens := []string{
"GLOBAL", // scope name
"scope", // policy lexicon
"permitted", // policy verb
"bridge", // internal architecture term
"LOCAL", // alternative scope name
"TEAM", // alternative scope name
}
for _, tok := range leakedTokens {
if bytes.Contains([]byte(resp.Error.Message), []byte(tok)) {
t.Errorf("OFFSEC-001 scrub regression: client-visible error.message leaks internal token %q (got: %q)", tok, resp.Error.Message)
}
}
// (4) C3 invariant preserved: handler must short-circuit before any DB call.
if err := mock.ExpectationsWereMet(); err != nil {
t.Errorf("unexpected DB calls on GLOBAL scope block: %v", err)
}
@@ -1,539 +0,0 @@
package handlers
import (
"strings"
"testing"
)
// ─────────────────────────────────────────────────────────────────────────────
// countWorkspaces tests
// ─────────────────────────────────────────────────────────────────────────────
func TestCountWorkspaces_Empty(t *testing.T) {
got := countWorkspaces(nil)
if got != 0 {
t.Errorf("nil: got %d, want 0", got)
}
got = countWorkspaces([]OrgWorkspace{})
if got != 0 {
t.Errorf("empty: got %d, want 0", got)
}
}
func TestCountWorkspaces_Flat(t *testing.T) {
tree := []OrgWorkspace{
{Name: "a"},
{Name: "b"},
{Name: "c"},
}
got := countWorkspaces(tree)
if got != 3 {
t.Errorf("flat 3: got %d, want 3", got)
}
}
func TestCountWorkspaces_Nested(t *testing.T) {
// root (1)
// / | \ (3 children)
// c1 c2 c3
// | |
// g1 g2 (2 grandchildren)
tree := []OrgWorkspace{
{
Name: "root",
Children: []OrgWorkspace{
{Name: "child1", Children: []OrgWorkspace{{Name: "grandchild1"}}},
{Name: "child2"},
{Name: "child3", Children: []OrgWorkspace{{Name: "grandchild2"}}},
},
},
}
got := countWorkspaces(tree)
if got != 6 {
t.Errorf("nested: got %d, want 6 (1 root + 3 children + 2 grandchildren)", got)
}
}
func TestCountWorkspaces_DeepNesting(t *testing.T) {
// chain of 5 levels
deep := []OrgWorkspace{
{Name: "L1", Children: []OrgWorkspace{
{Name: "L2", Children: []OrgWorkspace{
{Name: "L3", Children: []OrgWorkspace{
{Name: "L4", Children: []OrgWorkspace{
{Name: "L5"},
}},
}},
}},
}},
}
got := countWorkspaces(deep)
if got != 5 {
t.Errorf("deep chain: got %d, want 5", got)
}
}
// ─────────────────────────────────────────────────────────────────────────────
// envRequirementKey tests
// ─────────────────────────────────────────────────────────────────────────────
func TestEnvRequirementKey_SingleMember(t *testing.T) {
got := envRequirementKey([]string{"API_KEY"})
if got != "API_KEY" {
t.Errorf("single: got %q, want %q", got, "API_KEY")
}
}
func TestEnvRequirementKey_TwoMembers_OrderInsensitive(t *testing.T) {
keyAB := envRequirementKey([]string{"A", "B"})
keyBA := envRequirementKey([]string{"B", "A"})
if keyAB != keyBA {
t.Errorf("order-insensitive: [A,B]=%q, [B,A]=%q — must match", keyAB, keyBA)
}
}
func TestEnvRequirementKey_ThreeMembers_Sorted(t *testing.T) {
key := envRequirementKey([]string{"Z", "A", "M"})
// Should be "A\x00M\x00Z"
want := "A\x00M\x00Z"
if key != want {
t.Errorf("three members sorted: got %q, want %q", key, want)
}
}
func TestEnvRequirementKey_EmptyMembers(t *testing.T) {
got := envRequirementKey(nil)
if got != "" {
t.Errorf("nil: got %q, want empty", got)
}
got = envRequirementKey([]string{})
if got != "" {
t.Errorf("empty: got %q, want empty", got)
}
}
func TestEnvRequirementKey_DuplicateMembers(t *testing.T) {
// Duplicates should be preserved in sort; join still works
key := envRequirementKey([]string{"A", "A", "B"})
want := "A\x00A\x00B"
if key != want {
t.Errorf("duplicates: got %q, want %q", key, want)
}
}
func TestEnvRequirementKey_UsedForDedup(t *testing.T) {
// Real dedup case: {A,B} and {B,A} produce same key → dedup-eligible
// {A,B,C} produces a different key
keyAB := envRequirementKey([]string{"A", "B"})
keyBA := envRequirementKey([]string{"B", "A"})
keyABC := envRequirementKey([]string{"A", "B", "C"})
if keyAB != keyBA {
t.Errorf("AB vs BA: keys must match for dedup")
}
if keyAB == keyABC {
t.Errorf("AB vs ABC: keys must differ")
}
}
// ─────────────────────────────────────────────────────────────────────────────
// sanitizeEnvMembers tests
// ─────────────────────────────────────────────────────────────────────────────
// envVarNamePattern = ^[A-Z][A-Z0-9_]{0,127}$
func TestSanitizeEnvMembers_AllValid(t *testing.T) {
members := []string{"API_KEY", "MY_VAR_2", "A"}
got, ok := sanitizeEnvMembers(members, "test")
if !ok {
t.Error("all valid: ok should be true")
}
if len(got) != len(members) {
t.Errorf("all valid: got %v, want %v", got, members)
}
}
func TestSanitizeEnvMembers_SomeInvalid(t *testing.T) {
// Lowercase first char — invalid
members := []string{"API_KEY", "lowercase", "MY_VAR"}
got, ok := sanitizeEnvMembers(members, "test")
if !ok {
t.Error("one invalid: ok should be true (valid members remain)")
}
want := []string{"API_KEY", "MY_VAR"}
if len(got) != len(want) {
t.Errorf("one invalid: got %v, want %v", got, want)
}
}
func TestSanitizeEnvMembers_AllInvalid_DropsAll(t *testing.T) {
members := []string{"lowercase", "123_START", ""}
got, ok := sanitizeEnvMembers(members, "test")
if ok {
t.Error("all invalid: ok should be false")
}
if len(got) != 0 {
t.Errorf("all invalid: got %v, want empty", got)
}
}
func TestSanitizeEnvMembers_EmptyString_Skipped(t *testing.T) {
// Empty string is filtered but doesn't make ok=false
members := []string{"API_KEY", "", "MY_VAR"}
got, ok := sanitizeEnvMembers(members, "test")
if !ok {
t.Error("empty string in valid list: ok should be true")
}
if len(got) != 2 {
t.Errorf("empty string filtered: got %v, want [API_KEY, MY_VAR]", got)
}
}
func TestSanitizeEnvMembers_MaxLength(t *testing.T) {
// 128 chars: valid (1 prefix + 127 more = 128, all uppercase)
valid := "A" + strings.Repeat("B", 127)
got, ok := sanitizeEnvMembers([]string{valid}, "test")
if !ok {
t.Errorf("128 char valid: ok should be true, got %v", got)
}
// 129 chars: invalid (exceeds {0,127} suffix in regex)
tooLong := "A" + strings.Repeat("B", 128)
got, ok = sanitizeEnvMembers([]string{tooLong}, "test")
if ok {
t.Error("129 char invalid: ok should be false")
}
}
func TestSanitizeEnvMembers_DigitsAndUnderscore(t *testing.T) {
// regex ^[A-Z][A-Z0-9_]{0,127}$ — first char must be A-Z, not underscore
valid := []string{"A1", "A_2", "HTTP_200_OK", "ABC123"}
for _, v := range valid {
got, ok := sanitizeEnvMembers([]string{v}, "test")
if !ok {
t.Errorf("should be valid: %q", v)
}
if len(got) != 1 || got[0] != v {
t.Errorf("got %v, want [%q]", got, v)
}
}
}
// ─────────────────────────────────────────────────────────────────────────────
// flattenAndSortRequirements tests
// ─────────────────────────────────────────────────────────────────────────────
func TestFlattenAndSortRequirements_Empty(t *testing.T) {
got := flattenAndSortRequirements(map[string]EnvRequirement{})
if len(got) != 0 {
t.Errorf("empty: got %d, want 0", len(got))
}
}
func TestFlattenAndSortRequirements_SingleFirst(t *testing.T) {
// Singles come before groups; within singles, alphabetical
reqs := map[string]EnvRequirement{
envRequirementKey([]string{"ZETA"}): {Name: "ZETA"},
envRequirementKey([]string{"ALPHA"}): {Name: "ALPHA"},
}
got := flattenAndSortRequirements(reqs)
if len(got) != 2 {
t.Fatalf("got %d, want 2", len(got))
}
if got[0].Name != "ALPHA" {
t.Errorf("first: got %q, want ALPHA", got[0].Name)
}
if got[1].Name != "ZETA" {
t.Errorf("second: got %q, want ZETA", got[1].Name)
}
}
func TestFlattenAndSortRequirements_GroupsAfterSingles(t *testing.T) {
reqs := map[string]EnvRequirement{
envRequirementKey([]string{"X"}): {Name: "X"}, // single
envRequirementKey([]string{"A", "B"}): {AnyOf: []string{"A", "B"}}, // group
}
got := flattenAndSortRequirements(reqs)
if len(got) != 2 {
t.Fatalf("got %d, want 2", len(got))
}
// Single X comes before any group
if got[0].Name != "X" {
t.Errorf("first should be single X: got %+v", got[0])
}
if len(got[1].AnyOf) != 2 {
t.Errorf("second should be group: got %+v", got[1])
}
}
func TestFlattenAndSortRequirements_GroupsSortedByMemberKey(t *testing.T) {
// Groups sorted by their member-key (envRequirementKey sorts AnyOf members).
// {Z,A} → key "A\x00Z"; {B,C} → key "B\x00C". "A..." < "B..." → A,Z group first.
reqs := map[string]EnvRequirement{
envRequirementKey([]string{"Z", "A"}): {AnyOf: []string{"Z", "A"}}, // key: A\x00Z
envRequirementKey([]string{"B", "C"}): {AnyOf: []string{"B", "C"}}, // key: B\x00C
}
got := flattenAndSortRequirements(reqs)
if len(got) != 2 {
t.Fatalf("got %d, want 2", len(got))
}
// A\x00Z < B\x00C alphabetically, so the A,Z group sorts first
if len(got[0].AnyOf) != 2 || got[0].AnyOf[0] != "Z" {
t.Errorf("first group: got %+v, want [Z,A] (key A\\x00Z sorts before B\\x00C)", got[0])
}
}
// ─────────────────────────────────────────────────────────────────────────────
// collectOrgEnv tests
// ─────────────────────────────────────────────────────────────────────────────
func TestCollectOrgEnv_SingleRequired(t *testing.T) {
tmpl := &OrgTemplate{
RequiredEnv: []EnvRequirement{{Name: "API_KEY"}},
}
req, rec := collectOrgEnv(tmpl)
if len(req) != 1 {
t.Fatalf("got %d required, want 1", len(req))
}
if req[0].Name != "API_KEY" {
t.Errorf("name: got %q, want API_KEY", req[0].Name)
}
if len(rec) != 0 {
t.Errorf("recommended: got %d, want 0", len(rec))
}
}
func TestCollectOrgEnv_SingleRecommended(t *testing.T) {
tmpl := &OrgTemplate{
RecommendedEnv: []EnvRequirement{{Name: "DEBUG"}},
}
req, rec := collectOrgEnv(tmpl)
if len(req) != 0 {
t.Errorf("required: got %d, want 0", len(req))
}
if len(rec) != 1 {
t.Fatalf("got %d recommended, want 1", len(rec))
}
if rec[0].Name != "DEBUG" {
t.Errorf("name: got %q, want DEBUG", rec[0].Name)
}
}
func TestCollectOrgEnv_AnyOfGroup(t *testing.T) {
tmpl := &OrgTemplate{
RequiredEnv: []EnvRequirement{{AnyOf: []string{"AWS_KEY", "GCP_KEY", "AZURE_KEY"}}},
}
req, _ := collectOrgEnv(tmpl)
if len(req) != 1 {
t.Fatalf("got %d, want 1", len(req))
}
if len(req[0].AnyOf) != 3 {
t.Errorf("any_of members: got %v, want [AWS_KEY, GCP_KEY, AZURE_KEY]", req[0].AnyOf)
}
}
func TestCollectOrgEnv_InvalidNamesFiltered(t *testing.T) {
// "lowercase" and "" fail envVarNamePattern → silently dropped
tmpl := &OrgTemplate{
RequiredEnv: []EnvRequirement{{AnyOf: []string{"VALID_KEY", "lowercase", ""}}},
}
req, _ := collectOrgEnv(tmpl)
if len(req) != 1 {
t.Fatalf("invalid names filtered: got %d, want 1", len(req))
}
if len(req[0].AnyOf) != 1 || req[0].AnyOf[0] != "VALID_KEY" {
t.Errorf("valid names kept: got %v", req[0].AnyOf)
}
}
func TestCollectOrgEnv_GroupWithOneInvalid_KeepsRest(t *testing.T) {
// Mixed: one valid + one invalid → valid member is kept, invalid dropped
// regex requires ^[A-Z][A-Z0-9_]* — lowercase names are invalid
tmpl := &OrgTemplate{
RequiredEnv: []EnvRequirement{{AnyOf: []string{"GOOD_KEY", "lowercase_invalid"}}},
}
req, _ := collectOrgEnv(tmpl)
if len(req) != 1 {
t.Fatalf("got %d, want 1", len(req))
}
if len(req[0].AnyOf) != 1 || req[0].AnyOf[0] != "GOOD_KEY" {
t.Errorf("kept valid member: got %v, want [GOOD_KEY]", req[0].AnyOf)
}
}
func TestCollectOrgEnv_AllInvalidGroup_Dropped(t *testing.T) {
tmpl := &OrgTemplate{
RequiredEnv: []EnvRequirement{{AnyOf: []string{"lowercase", ""}}},
}
req, _ := collectOrgEnv(tmpl)
if len(req) != 0 {
t.Errorf("all-invalid group: got %d, want 0", len(req))
}
}
func TestCollectOrgEnv_RequiredSingleDominatesAnyOfGroup(t *testing.T) {
// Required: API_KEY (strict)
// Required: any_of [API_KEY, ALT_KEY]
// → the any_of group is redundant (API_KEY satisfies it already)
// → any_of group should be dropped from required
tmpl := &OrgTemplate{
RequiredEnv: []EnvRequirement{
{Name: "API_KEY"},
{AnyOf: []string{"API_KEY", "ALT_KEY"}},
},
}
req, _ := collectOrgEnv(tmpl)
if len(req) != 1 {
t.Fatalf("strict dominates group: got %d entries, want 1", len(req))
}
if req[0].Name != "API_KEY" {
t.Errorf("strict: got %+v, want name=API_KEY", req[0])
}
}
func TestCollectOrgEnv_RequiredSingleDominatesRecommendedAnyOf(t *testing.T) {
// Required: FOO (strict)
// Recommended: any_of [FOO, BAR]
// → FOO is already required; the recommended any_of is redundant
// → recommended any_of should be dropped
tmpl := &OrgTemplate{
RequiredEnv: []EnvRequirement{{Name: "FOO"}},
RecommendedEnv: []EnvRequirement{{AnyOf: []string{"FOO", "BAR"}}},
}
req, rec := collectOrgEnv(tmpl)
if len(req) != 1 || req[0].Name != "FOO" {
t.Errorf("required: got %+v", req)
}
if len(rec) != 0 {
t.Errorf("recommended any_of dominated by strict: got %d, want 0", len(rec))
}
}
func TestCollectOrgEnv_SameTierStrictDominatesGroup(t *testing.T) {
// Both in required: X (strict), any_of [X, Y] (group)
// Strict X makes the any_of redundant within the same tier
tmpl := &OrgTemplate{
RequiredEnv: []EnvRequirement{
{Name: "X"},
{AnyOf: []string{"X", "Y"}},
},
}
req, _ := collectOrgEnv(tmpl)
if len(req) != 1 {
t.Fatalf("got %d, want 1", len(req))
}
if req[0].Name != "X" {
t.Errorf("strict dominates same-tier group: got %+v", req[0])
}
}
func TestCollectOrgEnv_WorkspaceLevel(t *testing.T) {
// Workspaces can also declare required/recommended env
tmpl := &OrgTemplate{
Workspaces: []OrgWorkspace{
{
Name: "Dev",
RequiredEnv: []EnvRequirement{{Name: "DEV_KEY"}},
RecommendedEnv: []EnvRequirement{{Name: "DEV_TOOL"}},
},
},
}
req, rec := collectOrgEnv(tmpl)
if len(req) != 1 {
t.Fatalf("workspace required: got %d, want 1", len(req))
}
if req[0].Name != "DEV_KEY" {
t.Errorf("workspace required: got %v", req[0])
}
if len(rec) != 1 {
t.Fatalf("workspace recommended: got %d, want 1", len(rec))
}
if rec[0].Name != "DEV_TOOL" {
t.Errorf("workspace recommended: got %v", rec[0])
}
}
func TestCollectOrgEnv_DeepNesting(t *testing.T) {
// Nested children also contribute env requirements
tmpl := &OrgTemplate{
RequiredEnv: []EnvRequirement{{Name: "ORG_LEVEL"}},
Workspaces: []OrgWorkspace{
{
Name: "Root",
RequiredEnv: []EnvRequirement{{Name: "ROOT_LEVEL"}},
Children: []OrgWorkspace{
{
Name: "Child",
RequiredEnv: []EnvRequirement{{Name: "CHILD_LEVEL"}},
Children: []OrgWorkspace{
{Name: "GrandChild", RecommendedEnv: []EnvRequirement{{Name: "GRANDCHILD_TOOL"}}},
},
},
},
},
},
}
req, rec := collectOrgEnv(tmpl)
if len(req) != 3 {
t.Errorf("3 required levels: got %d: %+v", len(req), req)
}
if len(rec) != 1 {
t.Errorf("1 recommended: got %d: %+v", len(rec), rec)
}
}
func TestCollectOrgEnv_DedupAcrossTiers(t *testing.T) {
// Same key declared at org level AND workspace level → deduped to 1
tmpl := &OrgTemplate{
RequiredEnv: []EnvRequirement{{Name: "SHARED"}},
Workspaces: []OrgWorkspace{
{Name: "ws", RequiredEnv: []EnvRequirement{{Name: "SHARED"}}},
},
}
req, _ := collectOrgEnv(tmpl)
if len(req) != 1 {
t.Errorf("dedup across tiers: got %d, want 1", len(req))
}
}
func TestCollectOrgEnv_DedupWithinGroup(t *testing.T) {
// Same key declared multiple times within required → deduped
tmpl := &OrgTemplate{
RequiredEnv: []EnvRequirement{
{Name: "DUPE"},
{Name: "DUPE"},
},
}
req, _ := collectOrgEnv(tmpl)
if len(req) != 1 {
t.Errorf("dedup within tier: got %d, want 1", len(req))
}
}
func TestCollectOrgEnv_MixedCasePreservesSort(t *testing.T) {
// Sort order: singles first (alpha), then groups (by member-key)
tmpl := &OrgTemplate{
RequiredEnv: []EnvRequirement{
{Name: "ZETA"},
{Name: "ALPHA"},
{AnyOf: []string{"B", "A"}}, // key: A\x00B
{AnyOf: []string{"Y", "X"}}, // key: X\x00Y
},
}
req, _ := collectOrgEnv(tmpl)
if len(req) != 4 {
t.Fatalf("got %d, want 4", len(req))
}
// Singles first
if req[0].Name != "ALPHA" {
t.Errorf("single ALPHA first: got %+v", req[0])
}
if req[1].Name != "ZETA" {
t.Errorf("single ZETA second: got %+v", req[1])
}
// Groups after singles; A,B (key A\x00B) < X,Y (key X\x00Y)
if len(req[2].AnyOf) != 2 {
t.Errorf("third should be group: got %+v", req[2])
}
if req[2].AnyOf[0] != "B" { // "B" is first alphabetically in [A,B]
t.Errorf("A,B group should come first: got %+v", req[2])
}
}
@@ -1,242 +0,0 @@
package handlers
import (
"crypto/sha256"
"database/sql"
"net/http"
"net/http/httptest"
"testing"
"github.com/DATA-DOG/go-sqlmock"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/ws"
"github.com/gin-gonic/gin"
)
// newSocketHandlerWithDB creates a SocketHandler with buffered Hub channels.
// The DB is set up via setupTestDB (called before this function in each test).
func newSocketHandlerWithDB(t *testing.T, hub *ws.Hub) *SocketHandler {
t.Helper()
if hub == nil {
hub = &ws.Hub{
Register: make(chan *ws.Client, 1),
Unregister: make(chan *ws.Client, 1),
}
}
return NewSocketHandler(hub)
}
// socketRequest builds a test request for the WebSocket connect endpoint.
func socketRequest(method, path, workspaceID, authHeader string) *http.Request {
req := httptest.NewRequest(method, path, nil)
if workspaceID != "" {
req.Header.Set("X-Workspace-ID", workspaceID)
}
if authHeader != "" {
req.Header.Set("Authorization", authHeader)
}
return req
}
// ─────────────────────────────────────────────────────────────────────────────
// Auth gate: DB error on HasAnyLiveToken → 500
// ─────────────────────────────────────────────────────────────────────────────
func TestSocketHandler_AuthGate_DBError_Returns500(t *testing.T) {
mock := setupTestDB(t)
handler := newSocketHandlerWithDB(t, nil)
// HasAnyLiveToken issues a query; make it return an error.
mock.ExpectQuery("SELECT COUNT").
WithArgs("ws-auth-db-err").
WillReturnError(sql.ErrConnDone)
w := httptest.NewRecorder()
c, _ := gin.CreateTestContext(w)
c.Request = socketRequest("GET", "/ws", "ws-auth-db-err", "")
handler.HandleConnect(c)
if w.Code != http.StatusInternalServerError {
t.Errorf("DB error: expected 500, got %d", w.Code)
}
if err := mock.ExpectationsWereMet(); err != nil {
t.Errorf("unmet mock expectations: %v", err)
}
}
// ─────────────────────────────────────────────────────────────────────────────
// Auth gate: workspace HAS live token, missing Bearer → 401
// ─────────────────────────────────────────────────────────────────────────────
func TestSocketHandler_AuthGate_HasLiveToken_MissingBearer_Returns401(t *testing.T) {
mock := setupTestDB(t)
handler := newSocketHandlerWithDB(t, nil)
// HasAnyLiveToken succeeds → workspace has a live token.
mock.ExpectQuery("SELECT COUNT").
WithArgs("ws-has-token-no-bearer").
WillReturnRows(sqlmock.NewRows([]string{"n"}).AddRow(1))
w := httptest.NewRecorder()
c, _ := gin.CreateTestContext(w)
c.Request = socketRequest("GET", "/ws", "ws-has-token-no-bearer", "")
handler.HandleConnect(c)
if w.Code != http.StatusUnauthorized {
t.Errorf("hasLive but no bearer: expected 401, got %d", w.Code)
}
if err := mock.ExpectationsWereMet(); err != nil {
t.Errorf("unmet mock expectations: %v", err)
}
}
// ─────────────────────────────────────────────────────────────────────────────
// Auth gate: workspace HAS live token, invalid Bearer → 401
// ─────────────────────────────────────────────────────────────────────────────
func TestSocketHandler_AuthGate_HasLiveToken_InvalidBearer_Returns401(t *testing.T) {
mock := setupTestDB(t)
handler := newSocketHandlerWithDB(t, nil)
wsID := "ws-invalid-token"
badToken := "not-a-valid-token"
// HasAnyLiveToken: workspace has a live token.
mock.ExpectQuery("SELECT COUNT").
WithArgs(wsID).
WillReturnRows(sqlmock.NewRows([]string{"n"}).AddRow(1))
// ValidateToken: lookupTokenByHash returns ErrNoRows for an unknown token.
// Any token hash is fine since the token doesn't exist — use AnyArg.
mock.ExpectQuery(`SELECT t\.id, t\.workspace_id.*FROM workspace_auth_tokens t.*JOIN`).
WithArgs(sqlmock.AnyArg()).
WillReturnError(sql.ErrNoRows)
w := httptest.NewRecorder()
c, _ := gin.CreateTestContext(w)
c.Request = socketRequest("GET", "/ws", wsID, "Bearer "+badToken)
handler.HandleConnect(c)
if w.Code != http.StatusUnauthorized {
t.Errorf("hasLive but invalid bearer: expected 401, got %d", w.Code)
}
if err := mock.ExpectationsWereMet(); err != nil {
t.Errorf("unmet mock expectations: %v", err)
}
}
// ─────────────────────────────────────────────────────────────────────────────
// Auth gate: workspace HAS live token, VALID Bearer → upgrade attempted.
// The WebSocket upgrade itself will fail in httptest (gorilla/websocket
// cannot write a real HTTP/1.1 handshake to httptest.ResponseRecorder), but
// the auth gate is passed so we verify no 401/500 was returned before the
// upgrade failure. This is the canvas-client success path.
// ─────────────────────────────────────────────────────────────────────────────
func TestSocketHandler_AuthGate_HasLiveToken_ValidBearer_AuthPassed(t *testing.T) {
mock := setupTestDB(t)
handler := newSocketHandlerWithDB(t, nil)
wsID := "ws-valid-token"
goodToken := "valid-ws-token-123"
// HasAnyLiveToken: workspace has a live token.
mock.ExpectQuery("SELECT COUNT").
WithArgs(wsID).
WillReturnRows(sqlmock.NewRows([]string{"n"}).AddRow(1))
// ValidateToken: token found and workspace is not removed.
// sha256TokenHash returns []byte; rational matcher compares as string.
mock.ExpectQuery(`SELECT t\.id, t\.workspace_id.*FROM workspace_auth_tokens t.*JOIN`).
WithArgs(sha256TokenHash(goodToken)).
WillReturnRows(sqlmock.NewRows([]string{"token_id", "workspace_id"}).
AddRow("tok-abc", wsID))
w := httptest.NewRecorder()
c, _ := gin.CreateTestContext(w)
c.Request = socketRequest("GET", "/ws", wsID, "Bearer "+goodToken)
handler.HandleConnect(c)
// The WebSocket upgrade fails in httptest (httptest.ResponseRecorder is not
// a real TCP connection), but the auth gate itself succeeded — we should
// NOT see a 401 or 500 response code. The actual code depends on the
// upgrade error handling; the critical assertion is that auth passed.
if w.Code == http.StatusUnauthorized || w.Code == http.StatusInternalServerError {
t.Errorf("valid token: auth should have passed; got %d", w.Code)
}
if err := mock.ExpectationsWereMet(); err != nil {
t.Errorf("unmet mock expectations: %v", err)
}
}
// ─────────────────────────────────────────────────────────────────────────────
// Canvas client (no X-Workspace-ID): auth gate bypassed, upgrade attempted.
// Same httptest limitation as above — we verify no 401/500 before the upgrade.
// ─────────────────────────────────────────────────────────────────────────────
func TestSocketHandler_CanvasClient_NoAuthGate(t *testing.T) {
mock := setupTestDB(t)
handler := newSocketHandlerWithDB(t, nil)
// No X-Workspace-ID header → no auth check → no DB queries expected.
w := httptest.NewRecorder()
c, _ := gin.CreateTestContext(w)
c.Request = socketRequest("GET", "/ws", "", "") // no workspace ID
handler.HandleConnect(c)
// No auth gate hit → no 401/500. The WebSocket upgrade itself will fail
// in httptest, but that's expected (see TestSocketHandler_AuthGate_HasLiveToken_ValidBearer_AuthPassed).
if w.Code == http.StatusUnauthorized || w.Code == http.StatusInternalServerError {
t.Errorf("canvas client: expected no auth error; got %d", w.Code)
}
if err := mock.ExpectationsWereMet(); err != nil {
t.Errorf("unmet mock expectations: %v", err)
}
}
// ─────────────────────────────────────────────────────────────────────────────
// Legacy workspace: HAS live token flag but workspace exists AND ValidateToken
// is called. Since the workspace has a live token, the handler MUST validate
// the presented token (not grandfather through). This is the Phase 30.1/30.2
// contract — a workspace with tokens on file is NOT grandfathered.
// ─────────────────────────────────────────────────────────────────────────────
func TestSocketHandler_AuthGate_HasLiveToken_EmptyBearer_Returns401(t *testing.T) {
mock := setupTestDB(t)
handler := newSocketHandlerWithDB(t, nil)
wsID := "ws-has-live-token-empty-bearer"
// HasAnyLiveToken: workspace has a live token.
mock.ExpectQuery("SELECT COUNT").
WithArgs(wsID).
WillReturnRows(sqlmock.NewRows([]string{"n"}).AddRow(1))
// Authorization header is "Bearer " (empty token after "Bearer ").
// wsauth.BearerTokenFromHeader strips "Bearer " and gets "".
// ValidateToken is called with "" → returns ErrInvalidToken before DB hit.
w := httptest.NewRecorder()
c, _ := gin.CreateTestContext(w)
c.Request = socketRequest("GET", "/ws", wsID, "Bearer ")
handler.HandleConnect(c)
if w.Code != http.StatusUnauthorized {
t.Errorf("empty bearer after Bearer prefix: expected 401, got %d", w.Code)
}
}
// ─────────────────────────────────────────────────────────────────────────────
// helpers
// ─────────────────────────────────────────────────────────────────────────────
// sha256TokenHash returns the SHA256 hash of a plaintext token, matching what
// wsauth.ValidateToken does internally before querying the DB.
func sha256TokenHash(plaintext string) []byte {
h := sha256.Sum256([]byte(plaintext))
return h[:]
}
@@ -43,27 +43,6 @@ func (s *syncBuf) String() string {
return s.b.String()
}
// unwrapGoError extracts subprocess stderr from a Go-wrapped error that
// includes combined output. e.g. from sendSSHPublicKey's
// fmt.Errorf("send-ssh-public-key: %w (%s)", err, combinedOut), this
// returns the " (%s)" portion — the actionable subprocess signal like
// "AccessDeniedException: ... is not authorized to perform:
// ec2-instance-connect:OpenTunnel". Returns "" when the output is
// identical to the error string (no stderr captured).
func unwrapGoError(errMsg string) string {
// Extract content between the last '(' and trailing ')'. The
// sendSSHPublicKey wrapper uses fmt.Errorf("...: %w (%s)", err, combinedOut)
// so the subprocess stderr is always the last parenthesised segment,
// e.g. "send-ssh-public-key: exit status 1 (AccessDeniedException: ...)"
// — note the closing ')' is at the very end with no trailing space.
open := strings.LastIndex(errMsg, "(")
if open < 0 {
return ""
}
inner := errMsg[open+1:]
return strings.TrimSuffix(inner, ")")
}
// HandleDiagnose handles GET /workspaces/:id/terminal/diagnose. It runs the
// same per-step pipeline as HandleConnect (ssh-keygen → EIC send-key → tunnel
// → ssh) but non-interactively, captures the first failing step and its
@@ -235,18 +214,12 @@ func (h *TerminalHandler) diagnoseRemote(ctx context.Context, workspaceID, insta
}
// Step 2: send-ssh-public-key (AWS Instance Connect)
// mc#687: populate Detail so the E2E smoke sees the AWS permission error
// verbatim. The subprocess stderr (e.g. "AccessDeniedException: ... is not
// authorized to perform: ec2-instance-connect:OpenTunnel") is captured by
// sendSSHPublicKey's CombinedOutput() and embedded in the Go error string.
t0 = time.Now()
if err := sendSSHPublicKey(ctx, region, instanceID, osUser, strings.TrimSpace(string(pubKey))); err != nil {
errMsg := err.Error()
return stop("send-ssh-public-key", diagnoseStep{
Name: "send-ssh-public-key",
DurationMs: time.Since(t0).Milliseconds(),
Error: errMsg,
Detail: unwrapGoError(errMsg),
Error: err.Error(),
})
}
res.Steps = append(res.Steps, diagnoseStep{Name: "send-ssh-public-key", OK: true, DurationMs: time.Since(t0).Milliseconds()})
@@ -245,50 +245,3 @@ func TestDiagnoseRemote_StopsAtSSHProbe(t *testing.T) {
}
}
// TestUnwrapGoError pins the unwrapGoError helper that extracts subprocess
// stderr from the Go-wrapped error string produced by sendSSHPublicKey.
// Regression gate for mc#687: the E2E smoke now reads detail (not error),
// so detail MUST contain the actionable AWS permission signal.
func TestUnwrapGoError(t *testing.T) {
cases := []struct {
name string
input string
want string
}{
{
name: "AWS permission denied",
input: "send-ssh-public-key: exec: exit status 1 (AccessDeniedException: User: arn:aws:iam::123456789012:role/TestRole is not authorized to perform: ec2-instance-connect:OpenTunnel)",
want: "AccessDeniedException: User: arn:aws:iam::123456789012:role/TestRole is not authorized to perform: ec2-instance-connect:OpenTunnel",
},
{
name: "generic exec error no output",
input: "send-ssh-public-key: exec: exit status 1",
want: "",
},
{
name: "empty string",
input: "",
want: "",
},
{
name: "short string below threshold",
input: "err",
want: "",
},
{
name: "no parentheses",
input: "send-ssh-public-key: something went wrong",
want: "",
},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
got := unwrapGoError(tc.input)
if got != tc.want {
t.Errorf("unwrapGoError(%q): got %q, want %q", tc.input, got, tc.want)
}
})
}
}