Compare commits

..

3 Commits

Author SHA1 Message Date
core-devops 6ca390b504 fix(ci): replace polling all-required sentinel with needs-based aggregation
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 11s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 26s
CI / Detect changes (pull_request) Successful in 30s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 13s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 34s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 33s
E2E API Smoke Test / detect-changes (pull_request) Successful in 36s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 16s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 39s
qa-review / approved (pull_request) Failing after 23s
security-review / approved (pull_request) Failing after 19s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 6s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 5s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 7s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 7s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m20s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 1m54s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 1m40s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m39s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 2m0s
CI / Python Lint & Test (pull_request) Successful in 7m35s
CI / Platform (Go) (pull_request) Failing after 8m21s
CI / Canvas (Next.js) (pull_request) Successful in 10m22s
CI / Canvas Deploy Reminder (pull_request) Successful in 3s
CI / all-required (pull_request) Failing after 2s
audit-force-merge / audit (pull_request) Has been skipped
sop-checklist / all-items-acked (pull_request) Successful in 35s
gate-check-v3 / gate-check (pull_request) Successful in 1m11s
sop-tier-check / tier-check (pull_request) Successful in 50s
lint-mask-pr-atomicity / lint-mask-pr-atomicity (pull_request) Successful in 3m49s
all-required used a 45-minute Python polling loop against commit statuses.
This times out on PRs because it waits for "CI / Canvas Deploy Reminder
(pull_request)" which exits 0 without emitting a commit status — leaving
the polling sentinel permanently pending and blocking branch protection.

Fix: add `needs:` for all required jobs + `if: always()` so the sentinel
runs (and emits pass/fail) even when upstream jobs fail or skip.  Reduces
timeout from 45m to 1m.  canvas-deploy-reminder is included in needs —
its body is already a no-op for non-main-push events, so including it
does not block PRs while ensuring the sentinel has a concrete result
to wait on for main pushes.

Fixes: molecule-core#1083

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-14 22:49:03 +00:00
devops-engineer 8fced20267 fix: limit CP template config transport
Block internal-flavored paths / Block forbidden paths (push) Successful in 31s
CI / Detect changes (push) Successful in 58s
CI / Shellcheck (E2E scripts) (push) Successful in 42s
E2E API Smoke Test / detect-changes (push) Successful in 36s
Harness Replays / detect-changes (push) Successful in 19s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 45s
Handlers Postgres Integration / detect-changes (push) Successful in 49s
E2E Staging SaaS (full lifecycle) / pr-validate (push) Successful in 1m4s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 27s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 57s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (push) Successful in 5m22s
Harness Replays / Harness Replays (push) Successful in 29s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 36s
CI / Python Lint & Test (push) Successful in 7m45s
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 3m39s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Successful in 3m40s
Sweep stale Cloudflare Tunnels / Sweep CF tunnels (push) Successful in 38s
publish-workspace-server-image / build-and-push (push) Successful in 11m58s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 6m57s
CI / Canvas (Next.js) (push) Successful in 17m14s
CI / Canvas Deploy Reminder (push) Successful in 7s
CI / Platform (Go) (push) Failing after 17m57s
publish-workspace-server-image / Production auto-deploy (push) Failing after 2m37s
CI / all-required (push) Failing after 17m57s
Staging SaaS smoke (every 30 min) / Staging SaaS smoke (push) Successful in 4m57s
main-red-watchdog / watchdog (push) Successful in 57s
gate-check-v3 / gate-check (push) Successful in 24s
Sweep stale e2e-* orgs (staging) / Sweep e2e orgs (push) Successful in 2s
Sweep stale Cloudflare DNS records / Sweep CF orphans (push) Successful in 11s
ci-required-drift / drift (push) Successful in 1m0s
status-reaper / reap (push) Has started running
Continuous synthetic E2E (staging) / Synthetic E2E against staging (push) Successful in 5m4s
gitea-merge-queue / queue (push) Compensated by status-reaper (workflow has no push: trigger; Gitea 1.22.6 hardcoded-suffix bug — see .gitea/scripts/status-reaper.py)
2026-05-14 15:37:44 -07:00
devops-engineer 7b3e3fc189 ci: fix handlers instruction test compile
publish-workspace-server-image / Production auto-deploy (push) Blocked by required conditions
Block internal-flavored paths / Block forbidden paths (push) Successful in 10s
CI / Detect changes (push) Successful in 18s
CI / Shellcheck (E2E scripts) (push) Successful in 18s
Harness Replays / detect-changes (push) Successful in 10s
E2E API Smoke Test / detect-changes (push) Successful in 19s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 21s
Handlers Postgres Integration / detect-changes (push) Successful in 20s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 10s
gitea-merge-queue / queue (push) Successful in 13s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 19s
Sweep stale e2e-* orgs (staging) / Sweep e2e orgs (push) Successful in 31s
Harness Replays / Harness Replays (push) Successful in 9s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 12s
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 2m48s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Successful in 3m40s
Staging SaaS smoke (every 30 min) / Staging SaaS smoke (push) Successful in 5m19s
CI / Python Lint & Test (push) Successful in 7m34s
CI / Platform (Go) (push) Has been cancelled
CI / all-required (push) Has been cancelled
CI / Canvas Deploy Reminder (push) Has been cancelled
status-reaper / reap (push) Successful in 3m34s
Handlers Postgres Integration / Handlers Postgres Integration (push) Has been cancelled
publish-workspace-server-image / build-and-push (push) Has been cancelled
Continuous synthetic E2E (staging) / Synthetic E2E against staging (push) Successful in 7m34s
CI / Canvas (Next.js) (push) Failing after 9m4s
2026-05-14 15:25:09 -07:00
3 changed files with 60 additions and 98 deletions
+45 -98
View File
@@ -400,9 +400,9 @@ jobs:
canvas-deploy-reminder:
name: Canvas Deploy Reminder
runs-on: ubuntu-latest
# This job must run on PRs because all-required needs it. The step exits
# 0 when it is not a main push, giving branch protection a green no-op
# instead of a skipped/missing required dependency.
# This job must run on every CI trigger (including PRs) because all-required
# needs it as a dependency. The step body exits 0 when it is not a main-push,
# giving the aggregator a concrete success instead of a skipped/missing result.
needs: canvas-build
steps:
- name: Write deploy reminder to step summary
@@ -545,104 +545,51 @@ jobs:
# red silently merged through. See internal#286 for the three concrete
# tonight-of-2026-05-11 incidents that prompted the emergency bump.
#
# This job deliberately has no `needs:`. Gitea 1.22/act_runner can mark a
# job-level `if: always()` + `needs:` sentinel as skipped before upstream
# jobs settle, leaving branch protection with a permanent pending
# `CI / all-required` context. Instead, this independent sentinel polls the
# required commit-status contexts for this SHA and fails if any fail, skip,
# or never emit.
#
# canvas-deploy-reminder is intentionally NOT included in all-required.needs.
# It is an informational main-push reminder, not a PR quality gate. Keeping
# it in this dependency list lets a skipped reminder skip the required
# sentinel before the `always()` guard can emit a branch-protection status.
# Uses `needs:` so Gitea waits for all upstream jobs before this sentinel
# emits. `if: always()` ensures the sentinel runs (and reports pass/fail)
# even when an upstream job failed or was skipped. canvas-deploy-reminder
# is intentionally included — it exits 0 on non-main-push events so it
# never blocks PRs, and excluding it would leave the sentinel permanently
# pending on main pushes where reminder is a no-op.
#
needs:
- changes
- platform-build
- canvas-build
- shellcheck
- python-lint
- canvas-deploy-reminder
if: ${{ always() }}
continue-on-error: false
runs-on: ubuntu-latest
timeout-minutes: 45
timeout-minutes: 1
steps:
- name: Wait for required CI contexts
env:
GITEA_TOKEN: ${{ secrets.GITHUB_TOKEN }}
API_ROOT: ${{ github.server_url }}/api/v1
REPOSITORY: ${{ github.repository }}
COMMIT_SHA: ${{ github.sha }}
EVENT_NAME: ${{ github.event_name }}
- name: Verify all required jobs succeeded
run: |
set -euo pipefail
python3 - <<'PY'
import json
import os
import sys
import time
import urllib.error
import urllib.request
token = os.environ["GITEA_TOKEN"]
api_root = os.environ["API_ROOT"].rstrip("/")
repo = os.environ["REPOSITORY"]
sha = os.environ["COMMIT_SHA"]
event = os.environ["EVENT_NAME"]
required = [
f"CI / Detect changes ({event})",
f"CI / Platform (Go) ({event})",
f"CI / Canvas (Next.js) ({event})",
f"CI / Shellcheck (E2E scripts) ({event})",
f"CI / Python Lint & Test ({event})",
]
terminal_bad = {"failure", "error"}
deadline = time.time() + 40 * 60
last_summary = None
def fetch_statuses():
statuses = []
for page in range(1, 6):
url = f"{api_root}/repos/{repo}/commits/{sha}/statuses?page={page}&limit=100"
req = urllib.request.Request(url, headers={"Authorization": f"token {token}"})
with urllib.request.urlopen(req, timeout=10) as resp:
chunk = json.load(resp)
if not chunk:
break
statuses.extend(chunk)
latest = {}
for item in statuses:
ctx = item.get("context")
if not ctx:
continue
prev = latest.get(ctx)
if prev is None or (item.get("updated_at") or item.get("created_at") or "") >= (prev.get("updated_at") or prev.get("created_at") or ""):
latest[ctx] = item
return latest
while True:
try:
latest = fetch_statuses()
except (TimeoutError, OSError, urllib.error.URLError) as exc:
if time.time() >= deadline:
print(f"FAIL: status polling did not recover before deadline: {exc}", file=sys.stderr)
sys.exit(1)
print(f"WARN: status poll failed, retrying: {exc}", flush=True)
time.sleep(15)
continue
states = {ctx: (latest.get(ctx) or {}).get("status") or (latest.get(ctx) or {}).get("state") or "missing" for ctx in required}
summary = ", ".join(f"{ctx}={state}" for ctx, state in states.items())
if summary != last_summary:
print(summary, flush=True)
last_summary = summary
bad = {ctx: state for ctx, state in states.items() if state in terminal_bad}
if bad:
print("FAIL: required CI context failed:", file=sys.stderr)
for ctx, state in bad.items():
desc = (latest.get(ctx) or {}).get("description") or ""
print(f" - {ctx}: {state} {desc}", file=sys.stderr)
sys.exit(1)
if all(state == "success" for state in states.values()):
print(f"OK: all {len(required)} required CI contexts succeeded")
sys.exit(0)
if time.time() >= deadline:
print("FAIL: timed out waiting for required CI contexts:", file=sys.stderr)
for ctx, state in states.items():
print(f" - {ctx}: {state}", file=sys.stderr)
sys.exit(1)
time.sleep(15)
PY
FAILED=0
for job in changes platform-build canvas-build shellcheck python-lint canvas-deploy-reminder; do
result="$(gh api repos/${{ github.repository }}/actions/runs/${{ github.run_id }}/jobs --jq '.jobs[] | select(.name == env.JOB) | .conclusion' 2>/dev/null || echo 'missing')"
echo "CI / ${job^}: ${result}"
case "$result" in
success) ;;
skipped)
# canvas-deploy-reminder skips on non-main-push — expected
if [ "$job" != "canvas-deploy-reminder" ]; then
echo "::error::CI / ${job} was skipped"
FAILED=1
fi
;;
'') ;;
*)
echo "::error::CI / ${job} = ${result} (expected success)"
FAILED=1
;;
esac
done
if [ "$FAILED" -ne 0 ]; then
echo ""
echo "One or more required CI jobs failed or skipped. Fix before merging."
exit 1
fi
echo "All required CI jobs passed."
@@ -248,6 +248,11 @@ func (p *CPProvisioner) Start(ctx context.Context, cfg WorkspaceConfig) (string,
const cpConfigFilesMaxBytes = 12 << 10
func isCPTemplateConfigFile(name string) bool {
name = filepath.ToSlash(filepath.Clean(name))
return name == "config.yaml" || strings.HasPrefix(name, "prompts/")
}
func collectCPConfigFiles(cfg WorkspaceConfig) (map[string]string, error) {
files := make(map[string]string)
total := 0
@@ -301,6 +306,9 @@ func collectCPConfigFiles(cfg WorkspaceConfig) (map[string]string, error) {
if err != nil {
return err
}
if !isCPTemplateConfigFile(rel) {
return nil
}
data, err := os.ReadFile(path)
if err != nil {
return err
@@ -1,6 +1,7 @@
package provisioner
import (
"bytes"
"context"
"encoding/base64"
"encoding/json"
@@ -221,6 +222,9 @@ func TestStart_SendsTemplateAndGeneratedConfigFiles(t *testing.T) {
if err := os.WriteFile(filepath.Join(tmpl, "config.yaml"), []byte("name: template\n"), 0o600); err != nil {
t.Fatal(err)
}
if err := os.WriteFile(filepath.Join(tmpl, "adapter.py"), bytes.Repeat([]byte("x"), cpConfigFilesMaxBytes), 0o600); err != nil {
t.Fatal(err)
}
if err := os.Mkdir(filepath.Join(tmpl, "prompts"), 0o700); err != nil {
t.Fatal(err)
}
@@ -261,6 +265,9 @@ func TestStart_SendsTemplateAndGeneratedConfigFiles(t *testing.T) {
if got := body.ConfigFiles["prompts/system.md"]; got != wantPrompt {
t.Errorf("prompt payload = %q, want %q", got, wantPrompt)
}
if _, ok := body.ConfigFiles["adapter.py"]; ok {
t.Error("non-config template file adapter.py must not be sent to CP")
}
}
// TestStart_Non201ReturnsStructuredError — when CP returns 401 with a