fix(workflows): cancel-in-progress: true for all scheduled workflows (molecule-core#1357) #1359

Closed
core-devops wants to merge 3 commits from ci/scheduled-cancel-in-progress-1357 into main
Member

Summary

Flip cancel-in-progress: falsecancel-in-progress: true on all 15 scheduled workflows. Scheduled runs with cancel-in-progress: false allow queued runs to accumulate across cron cycles, saturating the 8-runner pool and starving PR pull_request_target jobs (issue #1357).

What changed

Workflow Change
ci-required-drift.yml cancel-in-progress: false → true
continuous-synth-e2e.yml cancel-in-progress: false → true
e2e-peer-visibility.yml cancel-in-progress: false → true
e2e-staging-canvas.yml cancel-in-progress: false → true
e2e-staging-external.yml cancel-in-progress: false → true
e2e-staging-saas.yml cancel-in-progress: false → true
e2e-staging-sanity.yml cancel-in-progress: false → true
gitea-merge-queue.yml cancel-in-progress: false → true
main-red-watchdog.yml cancel-in-progress: false → true
railway-pin-audit.yml cancel-in-progress: false → true
staging-smoke.yml cancel-in-progress: false → true
status-reaper.yml cancel-in-progress: false → true
sweep-cf-orphans.yml cancel-in-progress: false → true
sweep-cf-tunnels.yml cancel-in-progress: false → true
sweep-stale-e2e-orgs.yml cancel-in-progress: false → true

Test plan

  • YAML lint passes (lint-workflow-yaml.py: 55 workflow files checked, no fatal Gitea-1.22.6-hostile shapes)
  • Verify on next scheduled cron run: queued runs for each workflow clear, new runs execute
  • Verify PR jobs can acquire runner slots during scheduled run windows

Related

  • Fixes molecule-core#1357
  • Related: molecule-core#1352 (runner freeze — same runner pool saturation class)
  • Related: molecule-ai/internal#472 (P0 — operator host disk full — underlying infra cause)
## Summary Flip `cancel-in-progress: false` → `cancel-in-progress: true` on all 15 scheduled workflows. Scheduled runs with `cancel-in-progress: false` allow queued runs to accumulate across cron cycles, saturating the 8-runner pool and starving PR `pull_request_target` jobs (issue #1357). ## What changed | Workflow | Change | |---|---| | ci-required-drift.yml | cancel-in-progress: false → true | | continuous-synth-e2e.yml | cancel-in-progress: false → true | | e2e-peer-visibility.yml | cancel-in-progress: false → true | | e2e-staging-canvas.yml | cancel-in-progress: false → true | | e2e-staging-external.yml | cancel-in-progress: false → true | | e2e-staging-saas.yml | cancel-in-progress: false → true | | e2e-staging-sanity.yml | cancel-in-progress: false → true | | gitea-merge-queue.yml | cancel-in-progress: false → true | | main-red-watchdog.yml | cancel-in-progress: false → true | | railway-pin-audit.yml | cancel-in-progress: false → true | | staging-smoke.yml | cancel-in-progress: false → true | | status-reaper.yml | cancel-in-progress: false → true | | sweep-cf-orphans.yml | cancel-in-progress: false → true | | sweep-cf-tunnels.yml | cancel-in-progress: false → true | | sweep-stale-e2e-orgs.yml | cancel-in-progress: false → true | ## Test plan - [x] YAML lint passes (`lint-workflow-yaml.py`: 55 workflow files checked, no fatal Gitea-1.22.6-hostile shapes) - [ ] Verify on next scheduled cron run: queued runs for each workflow clear, new runs execute - [ ] Verify PR jobs can acquire runner slots during scheduled run windows ## Related - Fixes molecule-core#1357 - Related: molecule-core#1352 (runner freeze — same runner pool saturation class) - Related: molecule-ai/internal#472 (P0 — operator host disk full — underlying infra cause)
core-devops added 3 commits 2026-05-16 14:53:53 +00:00
ci(workflows): consolidate issue_comment subscribers — sop-checklist + review-refire (issue #1280)
gate-check-v3 / gate-check (pull_request) Waiting to run
sop-checklist / all-items-acked (pull_request) Waiting to run
66f3d0b0f6
Merge review-refire-comments.yml logic into sop-checklist.yml as the
`review-refire` job. Before: 2 workflows subscribed to issue_comment,
causing Gitea to queue 2 runner-assigned runs per comment
(~650 no-op runs/day, ~1,300 runner-slot-occupancy-hours/day).
After: 1 workflow, 1 issue_comment subscription, ~50% reduction.

Changes:
- sop-checklist.yml: add `review-refire` job with if: guard for
  /qa-recheck, /security-recheck, /refire-tier-check commands
- review-refire-comments.yml: deprecate, convert to no-op stub
  (will be deleted in follow-up PR after sop-checklist.yml lands)

Sequencing: review-refire-comments.yml kept as stub during transition
to avoid refire gap. Will be deleted after consolidation is confirmed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
fix(workflows): remove duplicate YAML keys + restore COMMENT_AUTHOR + add bp-required
E2E Chat / E2E Chat (pull_request) Blocked by required conditions
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 26s
CI / Detect changes (pull_request) Successful in 23s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 40s
E2E API Smoke Test / detect-changes (pull_request) Successful in 22s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (pull_request) Successful in 24s
E2E Chat / detect-changes (pull_request) Successful in 34s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 24s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 32s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 21s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Failing after 1m10s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Failing after 1m15s
lint-required-no-paths / lint-required-no-paths (pull_request) Failing after 53s
gate-check-v3 / gate-check (pull_request) Successful in 41s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 42s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 41s
qa-review / approved (pull_request) Failing after 31s
security-review / approved (pull_request) Failing after 33s
sop-checklist / all-items-acked (pull_request) Successful in 34s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 2m49s
sop-tier-check / tier-check (pull_request) Successful in 28s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 2m4s
CI / Python Lint & Test (pull_request) Successful in 8m36s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 19s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 17s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 19s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 19s
CI / Canvas (Next.js) (pull_request) Failing after 15m50s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / all-required (pull_request) Failing after 15m40s
CI / Platform (Go) (pull_request) Failing after 21m42s
eb055253ff
Three fixes consolidated onto infra-sre's clean rebase of PR #1333:

1. review-refire-comments.yml: remove duplicate `runs-on:`/`steps:` YAML
   merge-conflict artifact. Python yaml parser keeps the LAST key, so the
   deprecated stub (exit 0) was silently replaced by the old refire logic.
   The file is supposed to be a pure no-op stub pending deletion.

2. sop-checklist.yml: restore COMMENT_AUTHOR=${{ github.event.comment.user.login }}
   to all three refire env blocks (qa-review, security-review,
   sop-tier-check). The scripts use it for status descriptions; without
   it, descriptions show "unknown" for the caller.

3. e2e-peer-visibility.yml: add `# bp-required: pending #1296` to both
   pr-validate and peer-visibility jobs. Satisfies the
   lint-required-context-exists-in-bp convention for the intentionally
   RED e2e-peer-visibility gate.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
fix(workflows): cancel-in-progress: true for all scheduled workflows
Block internal-flavored paths / Block forbidden paths (pull_request) Waiting to run
CI / Detect changes (pull_request) Waiting to run
CI / Platform (Go) (pull_request) Waiting to run
CI / Canvas (Next.js) (pull_request) Waiting to run
CI / Shellcheck (E2E scripts) (pull_request) Waiting to run
CI / Python Lint & Test (pull_request) Waiting to run
CI / all-required (pull_request) Waiting to run
E2E API Smoke Test / detect-changes (pull_request) Waiting to run
E2E Chat / detect-changes (pull_request) Waiting to run
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (pull_request) Waiting to run
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Waiting to run
E2E Staging External Runtime / E2E Staging External Runtime (pull_request) Waiting to run
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Waiting to run
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Waiting to run
Handlers Postgres Integration / detect-changes (pull_request) Waiting to run
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Waiting to run
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Waiting to run
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Waiting to run
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Waiting to run
lint-required-no-paths / lint-required-no-paths (pull_request) Waiting to run
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Waiting to run
Runtime PR-Built Compatibility / detect-changes (pull_request) Waiting to run
Secret scan / Scan diff for credential-shaped strings (pull_request) Waiting to run
gate-check-v3 / gate-check (pull_request) Waiting to run
qa-review / approved (pull_request) Waiting to run
security-review / approved (pull_request) Waiting to run
sop-checklist / all-items-acked (pull_request) Waiting to run
sop-tier-check / tier-check (pull_request) Waiting to run
audit-force-merge / audit (pull_request) Waiting to run
CI / Canvas Deploy Reminder (pull_request) Has been cancelled
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Has been cancelled
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Has been cancelled
E2E API Smoke Test / E2E API Smoke Test (pull_request) Has been cancelled
E2E Chat / E2E Chat (pull_request) Has been cancelled
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Has been cancelled
3949788916
Scheduled workflows with cancel-in-progress: false allow queued runs to
accumulate across cron cycles, saturating the runner pool and starving
PR pull_request_target jobs (issue #1357).

Flip cancel-in-progress to true on all 15 scheduled workflows that had
cancel-in-progress: false. A new scheduled run now cancels any previously
queued run for the same concurrency group, preventing queue buildup.

Includes: ci-required-drift, continuous-synth-e2e, e2e-peer-visibility,
e2e-staging-canvas, e2e-staging-external, e2e-staging-saas,
e2e-staging-sanity, gitea-merge-queue, main-red-watchdog,
railway-pin-audit, staging-smoke, status-reaper, sweep-cf-orphans,
sweep-cf-tunnels, sweep-stale-e2e-orgs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
core-devops closed this pull request 2026-05-16 14:56:54 +00:00
Some checks are pending
Block internal-flavored paths / Block forbidden paths (pull_request) Waiting to run
CI / Detect changes (pull_request) Waiting to run
CI / Platform (Go) (pull_request) Waiting to run
CI / Canvas (Next.js) (pull_request) Waiting to run
CI / Shellcheck (E2E scripts) (pull_request) Waiting to run
CI / Python Lint & Test (pull_request) Waiting to run
CI / all-required (pull_request) Waiting to run
Required
Details
E2E API Smoke Test / detect-changes (pull_request) Waiting to run
E2E Chat / detect-changes (pull_request) Waiting to run
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (pull_request) Waiting to run
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Waiting to run
E2E Staging External Runtime / E2E Staging External Runtime (pull_request) Waiting to run
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Waiting to run
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Waiting to run
Handlers Postgres Integration / detect-changes (pull_request) Waiting to run
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Waiting to run
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Waiting to run
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Waiting to run
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Waiting to run
lint-required-no-paths / lint-required-no-paths (pull_request) Waiting to run
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Waiting to run
Runtime PR-Built Compatibility / detect-changes (pull_request) Waiting to run
Secret scan / Scan diff for credential-shaped strings (pull_request) Waiting to run
gate-check-v3 / gate-check (pull_request) Waiting to run
qa-review / approved (pull_request) Waiting to run
security-review / approved (pull_request) Waiting to run
sop-checklist / all-items-acked (pull_request) Waiting to run
sop-tier-check / tier-check (pull_request) Waiting to run
audit-force-merge / audit (pull_request) Waiting to run
CI / Canvas Deploy Reminder (pull_request) Has been cancelled
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Has been cancelled
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Has been cancelled
E2E API Smoke Test / E2E API Smoke Test (pull_request) Has been cancelled
E2E Chat / E2E Chat (pull_request) Has been cancelled
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Has been cancelled

Pull request closed

Sign in to join this conversation.
No Reviewers
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#1359