Compare commits

..

2 Commits

Author SHA1 Message Date
core-devops 862a90a316 fix(runtime): complete crewai/deepagents/gemini-cli removal + fail-closed on unresolvable runtime
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 2s
cascade-list-drift-gate / check (pull_request) Failing after 2s
CI / Detect changes (pull_request) Successful in 4s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 8s
CI / Platform (Go) (pull_request) Successful in 4m18s
E2E API Smoke Test / detect-changes (pull_request) Successful in 4s
E2E Chat / detect-changes (pull_request) Successful in 4s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 4s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Has been skipped
Handlers Postgres Integration / detect-changes (pull_request) Successful in 2s
Harness Replays / detect-changes (pull_request) Successful in 3s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 5s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 3s
gate-check-v3 / gate-check (pull_request) Successful in 3s
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Successful in 22s
qa-review / approved (pull_request) Failing after 2s
security-review / approved (pull_request) Successful in 3s
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request) Successful in 3s
sop-tier-check / tier-check (pull_request) Successful in 3s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 50s
CI / Canvas (Next.js) (pull_request) Successful in 5m38s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 2s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 37s
Harness Replays / Harness Replays (pull_request) Successful in 1s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 2s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 1m11s
CI / Python Lint & Test (pull_request) Successful in 6m20s
CI / all-required (pull_request) Successful in 6m21s
E2E Chat / E2E Chat (pull_request) Failing after 4m13s
audit-force-merge / audit (pull_request) Successful in 3s
Addresses core-security REQUEST-CHANGES (review 4269) on PR #1385
(RFC internal#483). The original removal was incomplete and silently
changed behavior:

1. manifest.json workspace_templates still listed crewai/deepagents/
   gemini-cli. handlers.knownRuntimes (the workspace-create admission
   allowlist) is built at boot from manifest.json via runtime_registry.go
   — NOT from provisioner.knownRuntimes — so a create request for a
   "removed" runtime was still admitted. Removed the 3 entries; the
   manifest now resolves to the 6 kept template-backed runtimes
   (claude-code, hermes, openclaw, codex, langgraph, autogen) + injected
   meta-runtimes. langgraph/autogen retained intentionally — not in the
   internal#483 removal set and still in provisioner.knownRuntimes.

2. selectImage silently fell back to DefaultImage (langgraph) for any
   named-but-unresolvable runtime: a user asking for crewai got a
   langgraph container with no signal. selectImage now returns
   (string, error) and rejects a NAMED-but-unresolvable runtime with
   ErrUnresolvableRuntime (naming the offending runtime). Start
   propagates it; the existing markProvisionFailed path already
   broadcasts WorkspaceProvisionFailed + records the message. Implements
   the CTO hardgate directive (feedback_platform_must_hardgate_base_contract
   — fail-closed, no silent degrade). The genuinely-unspecified (empty)
   runtime remains a distinct legitimate path → DefaultImage.

go build ./... and go test ./internal/provisioner/ ./internal/handlers/
./internal/imagewatch/ ./internal/registry/ ./internal/models/ all pass.
selectImage tests updated to assert the new fail-closed contract (new
expected values verified against the security finding, not made-green).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 15:36:16 -07:00
core-devops e809a2e319 chore(runtime): remove crewai/deepagents/gemini-cli from the runtime catalog
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 3s
CI / Detect changes (pull_request) Successful in 4s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 15s
E2E API Smoke Test / detect-changes (pull_request) Successful in 5s
E2E Chat / detect-changes (pull_request) Successful in 4s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 4s
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Successful in 28s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Has been skipped
Handlers Postgres Integration / detect-changes (pull_request) Successful in 5s
Harness Replays / detect-changes (pull_request) Successful in 5s
CI / Platform (Go) (pull_request) Successful in 4m59s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 49s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 4s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 3s
gate-check-v3 / gate-check (pull_request) Successful in 4s
qa-review / approved (pull_request) Failing after 2s
security-review / approved (pull_request) Failing after 3s
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request) Successful in 2s
sop-tier-check / tier-check (pull_request) Successful in 4s
CI / Canvas (Next.js) (pull_request) Successful in 6m34s
CI / Python Lint & Test (pull_request) Successful in 6m28s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 1s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 36s
Harness Replays / Harness Replays (pull_request) Successful in 2s
CI / all-required (pull_request) Successful in 6m23s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 1s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2m3s
E2E Chat / E2E Chat (pull_request) Failing after 4m58s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CTO-approved scope-narrowing to the four maintained runtimes
(claude-code, openclaw, hermes, codex) plus autogen/langgraph legacy.
RFC molecule-ai/internal#483.

- provisioner/registry.go: drop the 3 from knownRuntimes (kept
  alphabetical ordering; count pin updated 9 -> 6 in registry_test.go).
- handlers/admin_workspace_images.go: drop the 3 from AllRuntimes so
  imagewatch stops pulling now-removed template images.
- models/runtime_defaults.go: doc-comment list updated; drop the 3
  explicit drift-detection rows from runtime_defaults_test.go.
- localbuild_test.go RepoNotFound fixture switched crewai -> hermes
  (post-removal crewai is no longer a known runtime, so the test would
  hit the "unknown runtime" branch instead of the intended 404 path).
- provisioner_test.go error-classification fixture string crewai ->
  hermes (no removed-runtime names in fixtures).

Live-workspace safety gate cleared first: zero workspaces on the 3
runtimes across the production fleet (per-tenant DB query on
hongming/go-to-market/chloe-dong + CP tenant_resources provisioning
ledger covering reno-stars). Registry/pin row removal handled by a
separate controlplane migration PR, landed after this merges.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 15:17:26 -07:00
5 changed files with 3 additions and 129 deletions
+1 -17
View File
@@ -407,23 +407,7 @@ def process_once(*, dry_run: bool = False) -> int:
"deferring to next tick"
)
return 0
try:
merge_pull(pr_number, dry_run=dry_run)
except ApiError as exc:
# Merge API errors (405 permission denied, 422 hook block, etc.)
# are NOT transient — retrying will not help. Surface the error
# on the PR immediately so it is visible without digging into
# workflow logs, and fail the workflow so it is distinguishable
# from a successful-no-op tick.
post_comment(
pr_number,
f"merge-queue: MERGE FAILED — {exc}. "
"This is a non-transient error (permission or hook issue). "
"See SEV-1 internal#487.",
dry_run=dry_run,
)
sys.stderr.write(f"::error::PR #{pr_number} merge failed: {exc}\n")
return 2 # distinct exit code so workflow run shows failure
merge_pull(pr_number, dry_run=dry_run)
return 0
return 0
+2 -11
View File
@@ -830,18 +830,9 @@ def main(argv: list[str] | None = None) -> int:
# one membership lookup per team.
team_member_cache: dict[tuple[str, int], bool | None] = {}
def _required_teams_for(slug: str) -> list[str] | None:
"""Look up required_teams for a slug from checklist items OR N/A gates."""
if slug in items_by_slug:
return items_by_slug[slug]["required_teams"]
if slug in na_gates:
return na_gates[slug].get("required_teams", [])
return None
def probe(slug: str, users: list[str]) -> list[str]:
team_names = _required_teams_for(slug)
if team_names is None:
raise KeyError(f"slug '{slug}' not found in items or N/A gates")
item = items_by_slug[slug]
team_names: list[str] = item["required_teams"]
# Resolve names → ids. NOTE: orgs/{org}/teams/search may not be
# available — fall back to the list endpoint.
team_ids: list[int] = []
@@ -1,7 +1,6 @@
import importlib.util
import sys
from pathlib import Path
from unittest.mock import patch
SCRIPT = Path(__file__).resolve().parents[1] / "gitea-merge-queue.py"
@@ -119,54 +118,3 @@ def test_merge_decision_updates_stale_pr_before_merge():
assert decision.ready is False
assert decision.action == "update"
def test_merge_failure_returns_nonzero_and_posts_comment(monkeypatch):
"""When merge_pull raises ApiError (e.g. HTTP 405 permission denied),
process_once returns exit code 2 (non-zero) and posts a comment on the PR.
This distinguishes merge-permission errors from successful-no-op ticks."""
captured_comment = {}
def fake_post_comment(pr_number, body, *, dry_run):
captured_comment["pr_number"] = pr_number
captured_comment["body"] = body
# Replace functions directly on the module object so process_once()
# (which looks them up by name at call time) picks up the fakes.
mq.list_queued_issues = lambda: [{
"number": 42,
"created_at": "2026-05-17T00:00:00Z",
"labels": [{"name": "merge-queue"}],
"pull_request": {},
}]
mq.get_pull = lambda n: {
"state": "open",
"base": {"ref": "main", "repo_id": 1},
"head": {"sha": "headsha", "repo_id": 1},
"merge_base": "abc123def",
}
mq.get_pull_commits = lambda n: [{"sha": "headsha"}]
mq.get_branch_head = lambda branch: "abc123def"
mq.get_combined_status = lambda sha: {
"state": "success",
"statuses": [{"context": "CI / all-required (push)", "status": "success"}],
}
mq.latest_statuses_by_context = lambda s: {
"CI / all-required (pull_request)": {"status": "success"},
"sop-checklist / all-items-acked (pull_request)": {"status": "success"},
}
mq.required_contexts_green = lambda statuses, contexts: (True, [])
mq.post_comment = fake_post_comment
# Simulate merge failing with HTTP 405 (permission denied).
# The ApiError raised by api() is caught inside process_once().
merge_error = mq.ApiError(
"POST /repos/x/y/pulls/42/merge -> HTTP 405: User not allowed to merge PR"
)
with patch.object(mq, "merge_pull", side_effect=merge_error):
exit_code = mq.process_once(dry_run=False)
assert exit_code == 2, f"Expected exit code 2, got {exit_code}"
assert captured_comment["pr_number"] == 42
assert "MERGE FAILED" in captured_comment["body"]
assert "405" in captured_comment["body"]
@@ -603,51 +603,3 @@ class TestComputeNaState(unittest.TestCase):
self.assertEqual(na_directives[0][0], "sop-n/a")
self.assertEqual(na_directives[0][1], "qa-review")
self.assertIn("no surface", na_directives[0][2])
class TestProbeNaGateFallback(unittest.TestCase):
"""Regression test: probe() must handle gate names (qa-review, security-review)
from N/A gates without raising KeyError.
mc#1389: compute_na_state calls probe(gate_name, [user]) where gate_name is
a gate name like 'qa-review' — NOT a checklist item slug. The probe must
resolve the gate's required_teams from na_gates, not raise KeyError from
items_by_slug lookup.
"""
def test_probe_resolves_gate_name_from_na_gates(self):
cfg = sop.load_config(CONFIG_PATH)
items = cfg["items"]
items_by_slug = {it["slug"]: it for it in items}
na_gates = cfg.get("n/a_gates", {})
# Reconstruct the _required_teams_for helper from sop-checklist.py
def _required_teams_for(slug):
if slug in items_by_slug:
return items_by_slug[slug]["required_teams"]
if slug in na_gates:
return na_gates[slug].get("required_teams", [])
return None
# Gate names should resolve from na_gates
self.assertEqual(
_required_teams_for("qa-review"),
["qa", "security", "engineers"],
)
self.assertEqual(
_required_teams_for("security-review"),
["security", "managers", "ceo"],
)
# Checklist item slugs should still resolve from items_by_slug
self.assertEqual(
_required_teams_for("comprehensive-testing"),
["qa", "engineers"],
)
self.assertEqual(
_required_teams_for("root-cause"),
["managers", "ceo"],
)
# Unknown slug should return None (not raise KeyError)
self.assertIsNone(_required_teams_for("nonexistent-slug"))
-1
View File
@@ -162,7 +162,6 @@ jobs:
exit 1
fi
python -m twine upload \
--verbose \
--repository pypi \
--username __token__ \
--password "$PYPI_TOKEN" \