Compare commits

..

2 Commits

Author SHA1 Message Date
core-devops 862a90a316 fix(runtime): complete crewai/deepagents/gemini-cli removal + fail-closed on unresolvable runtime
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 2s
cascade-list-drift-gate / check (pull_request) Failing after 2s
CI / Detect changes (pull_request) Successful in 4s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 8s
CI / Platform (Go) (pull_request) Successful in 4m18s
E2E API Smoke Test / detect-changes (pull_request) Successful in 4s
E2E Chat / detect-changes (pull_request) Successful in 4s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 4s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Has been skipped
Handlers Postgres Integration / detect-changes (pull_request) Successful in 2s
Harness Replays / detect-changes (pull_request) Successful in 3s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 5s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 3s
gate-check-v3 / gate-check (pull_request) Successful in 3s
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Successful in 22s
qa-review / approved (pull_request) Failing after 2s
security-review / approved (pull_request) Successful in 3s
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request) Successful in 3s
sop-tier-check / tier-check (pull_request) Successful in 3s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 50s
CI / Canvas (Next.js) (pull_request) Successful in 5m38s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 2s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 37s
Harness Replays / Harness Replays (pull_request) Successful in 1s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 2s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 1m11s
CI / Python Lint & Test (pull_request) Successful in 6m20s
CI / all-required (pull_request) Successful in 6m21s
E2E Chat / E2E Chat (pull_request) Failing after 4m13s
audit-force-merge / audit (pull_request) Successful in 3s
Addresses core-security REQUEST-CHANGES (review 4269) on PR #1385
(RFC internal#483). The original removal was incomplete and silently
changed behavior:

1. manifest.json workspace_templates still listed crewai/deepagents/
   gemini-cli. handlers.knownRuntimes (the workspace-create admission
   allowlist) is built at boot from manifest.json via runtime_registry.go
   — NOT from provisioner.knownRuntimes — so a create request for a
   "removed" runtime was still admitted. Removed the 3 entries; the
   manifest now resolves to the 6 kept template-backed runtimes
   (claude-code, hermes, openclaw, codex, langgraph, autogen) + injected
   meta-runtimes. langgraph/autogen retained intentionally — not in the
   internal#483 removal set and still in provisioner.knownRuntimes.

2. selectImage silently fell back to DefaultImage (langgraph) for any
   named-but-unresolvable runtime: a user asking for crewai got a
   langgraph container with no signal. selectImage now returns
   (string, error) and rejects a NAMED-but-unresolvable runtime with
   ErrUnresolvableRuntime (naming the offending runtime). Start
   propagates it; the existing markProvisionFailed path already
   broadcasts WorkspaceProvisionFailed + records the message. Implements
   the CTO hardgate directive (feedback_platform_must_hardgate_base_contract
   — fail-closed, no silent degrade). The genuinely-unspecified (empty)
   runtime remains a distinct legitimate path → DefaultImage.

go build ./... and go test ./internal/provisioner/ ./internal/handlers/
./internal/imagewatch/ ./internal/registry/ ./internal/models/ all pass.
selectImage tests updated to assert the new fail-closed contract (new
expected values verified against the security finding, not made-green).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 15:36:16 -07:00
core-devops e809a2e319 chore(runtime): remove crewai/deepagents/gemini-cli from the runtime catalog
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 3s
CI / Detect changes (pull_request) Successful in 4s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 15s
E2E API Smoke Test / detect-changes (pull_request) Successful in 5s
E2E Chat / detect-changes (pull_request) Successful in 4s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 4s
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Successful in 28s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Has been skipped
Handlers Postgres Integration / detect-changes (pull_request) Successful in 5s
Harness Replays / detect-changes (pull_request) Successful in 5s
CI / Platform (Go) (pull_request) Successful in 4m59s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 49s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 4s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 3s
gate-check-v3 / gate-check (pull_request) Successful in 4s
qa-review / approved (pull_request) Failing after 2s
security-review / approved (pull_request) Failing after 3s
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request) Successful in 2s
sop-tier-check / tier-check (pull_request) Successful in 4s
CI / Canvas (Next.js) (pull_request) Successful in 6m34s
CI / Python Lint & Test (pull_request) Successful in 6m28s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 1s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 36s
Harness Replays / Harness Replays (pull_request) Successful in 2s
CI / all-required (pull_request) Successful in 6m23s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 1s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2m3s
E2E Chat / E2E Chat (pull_request) Failing after 4m58s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CTO-approved scope-narrowing to the four maintained runtimes
(claude-code, openclaw, hermes, codex) plus autogen/langgraph legacy.
RFC molecule-ai/internal#483.

- provisioner/registry.go: drop the 3 from knownRuntimes (kept
  alphabetical ordering; count pin updated 9 -> 6 in registry_test.go).
- handlers/admin_workspace_images.go: drop the 3 from AllRuntimes so
  imagewatch stops pulling now-removed template images.
- models/runtime_defaults.go: doc-comment list updated; drop the 3
  explicit drift-detection rows from runtime_defaults_test.go.
- localbuild_test.go RepoNotFound fixture switched crewai -> hermes
  (post-removal crewai is no longer a known runtime, so the test would
  hit the "unknown runtime" branch instead of the intended 404 path).
- provisioner_test.go error-classification fixture string crewai ->
  hermes (no removed-runtime names in fixtures).

Live-workspace safety gate cleared first: zero workspaces on the 3
runtimes across the production fleet (per-tenant DB query on
hongming/go-to-market/chloe-dong + CP tenant_resources provisioning
ledger covering reno-stars). Registry/pin row removal handled by a
separate controlplane migration PR, landed after this merges.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 15:17:26 -07:00
2 changed files with 15 additions and 47 deletions
+15 -46
View File
@@ -148,38 +148,15 @@ def latest_statuses_by_context(statuses: list[dict]) -> dict[str, dict]:
return latest
def _is_tier_low_pending_ok(
latest_statuses: dict[str, dict],
context: str,
pr_labels: set[str],
) -> bool:
"""Return True if tier:low PR can tolerate sop-checklist pending state.
Per sop-checklist-config.yaml tier_failure_mode, tier:low uses soft-fail:
sop-checklist posts state=pending when acks are satisfied (missing
manager/ceo acks are informational only). The queue should accept
pending instead of waiting for success.
"""
if "tier:low" not in pr_labels:
return False
if "sop-checklist" not in context:
return False
status = latest_statuses.get(context) or {}
return status_state(status) == "pending"
def required_contexts_green(
latest_statuses: dict[str, dict],
contexts: list[str],
pr_labels: set[str] | None = None,
) -> tuple[bool, list[str]]:
missing_or_bad: list[str] = []
for context in contexts:
status = latest_statuses.get(context)
state = status_state(status or {})
if state != "success":
if pr_labels and _is_tier_low_pending_ok(latest_statuses, context, pr_labels):
continue # tier:low soft-fail: accept pending sop-checklist
missing_or_bad.append(f"{context}={state or 'missing'}")
return not missing_or_bad, missing_or_bad
@@ -232,7 +209,6 @@ def evaluate_merge_readiness(
pr_status: dict,
required_contexts: list[str],
pr_has_current_base: bool,
pr_labels: set[str] | None = None,
) -> MergeDecision:
# Check push-required contexts explicitly instead of combined state.
# Combined state can be "failure" due to non-blocking jobs
@@ -252,7 +228,7 @@ def evaluate_merge_readiness(
# The required_contexts list is the authoritative gate — it includes only
# the checks that actually block merges.
latest = latest_statuses_by_context(pr_status.get("statuses") or [])
ok, missing_or_bad = required_contexts_green(latest, required_contexts, pr_labels)
ok, missing_or_bad = required_contexts_green(latest, required_contexts)
if not ok:
return MergeDecision(False, "wait", "required contexts not green: " + ", ".join(missing_or_bad))
return MergeDecision(True, "merge", "ready")
@@ -277,32 +253,27 @@ def get_combined_status(sha: str) -> dict:
_, combined = api("GET", f"/repos/{OWNER}/{NAME}/commits/{sha}/status")
if not isinstance(combined, dict):
raise ApiError(f"status for {sha} response not object")
combined_statuses: list[dict] = combined.get("statuses") or []
# Fetch full statuses list; 200 covers >99% of real-world runs.
# The list is ordered ascending by id (oldest first) — callers must
# iterate in reverse to get the newest entry per context.
# Best-effort: large repos (main with 550+ statuses) may time out.
# On timeout, fall back to the statuses[] already in the combined
# response (usually 30 entries — enough for most PRs, enough for
# main's early push-required contexts).
try:
_, all_statuses_raw = api(
_, all_statuses = api(
"GET",
f"/repos/{OWNER}/{NAME}/commits/{sha}/statuses",
query={"limit": "50"},
)
if isinstance(all_statuses_raw, list):
all_statuses: list[dict] = list(all_statuses_raw)
else:
all_statuses = []
if isinstance(all_statuses, list):
combined["statuses"] = all_statuses
except (ApiError, urllib.error.URLError, TimeoutError, OSError) as exc:
# URLError covers network-level failures (DNS, refused, timeout).
# TimeoutError and OSError cover socket-level timeouts.
sys.stderr.write(f"::warning::could not fetch full statuses list for {sha[:8]}: {exc}\n")
all_statuses = []
# Build latest per context: process combined (ascending→reverse=newest
# first), then fill gaps from all_statuses (already newest-first).
latest: dict[str, dict] = {}
for status in reversed(sorted(combined_statuses, key=lambda s: s.get("id") or 0)):
ctx = status.get("context")
if isinstance(ctx, str) and ctx not in latest:
latest[ctx] = status
for status in all_statuses:
ctx = status.get("context")
if isinstance(ctx, str) and ctx not in latest:
latest[ctx] = status
combined["statuses"] = list(latest.values())
# Fall back to the statuses[] already in the combined response.
pass
return combined
@@ -409,13 +380,11 @@ def process_once(*, dry_run: bool = False) -> int:
commits = get_pull_commits(pr_number)
current_base = pr_has_current_base(pr, commits, main_sha)
pr_status = get_combined_status(head_sha)
pr_labels = label_names(pr)
decision = evaluate_merge_readiness(
main_status=main_status,
pr_status=pr_status,
required_contexts=contexts,
pr_has_current_base=current_base,
pr_labels=pr_labels,
)
print(f"::notice::PR #{pr_number} decision={decision.action}: {decision.reason}")
-1
View File
@@ -162,7 +162,6 @@ jobs:
exit 1
fi
python -m twine upload \
--verbose \
--repository pypi \
--username __token__ \
--password "$PYPI_TOKEN" \