ci(publish): retarget ship-the-fix workflows to dedicated publish lane (internal#394/#399) [BLOCKED on lane] #1315

Closed
hongming wants to merge 1 commits from feat/publish-lane-runs-on-394 into main
Owner

Summary

  • Retargets ONLY the 6 post-merge ship jobs (publish-runtime publish+cascade, publish-canvas build-and-push, publish-workspace-server build-and-push+deploy-production, redeploy-tenants redeploy) to the reserved publish lane so a merged fix ships immediately instead of queuing behind PR-CI (internal#399).
  • PR-validation jobs deliberately NOT moved (publish-runtime-autobump pr-validate stays on the general pool).

⚠ HARD MERGE BLOCK

Gitea 1.22.6: a bare runs-on: <label> with NO runner advertising it queues FOREVER (the trap that got the docker label reverted). This PR MUST NOT merge until BOTH:

  1. operator-config#63 merged + propagated, AND
  2. publish-lane-ensure.sh run under explicit Hongming GO, lane verified registered (≥2 publish runners).

Merging earlier wedges every release + production deploy.

Test plan

  • All 4 workflows valid YAML; exactly 6 ship jobs retargeted; PR-validate confirmed unchanged
  • Non-author five-axis review
  • operator-config#63 landed + lane instantiated under GO
  • After lane live: dry-run a publish (workflow_dispatch) and confirm it lands on a publish runner, not the general pool

Refs internal#394, #399, #305; pairs with operator-config#63.

## Summary - Retargets ONLY the 6 post-merge ship jobs (publish-runtime publish+cascade, publish-canvas build-and-push, publish-workspace-server build-and-push+deploy-production, redeploy-tenants redeploy) to the reserved `publish` lane so a merged fix ships immediately instead of queuing behind PR-CI (internal#399). - PR-validation jobs deliberately NOT moved (publish-runtime-autobump pr-validate stays on the general pool). ## ⚠ HARD MERGE BLOCK Gitea 1.22.6: a bare `runs-on: <label>` with NO runner advertising it queues FOREVER (the trap that got the `docker` label reverted). This PR MUST NOT merge until BOTH: 1. operator-config#63 merged + propagated, AND 2. publish-lane-ensure.sh run under explicit Hongming GO, lane verified registered (≥2 publish runners). Merging earlier wedges every release + production deploy. ## Test plan - [x] All 4 workflows valid YAML; exactly 6 ship jobs retargeted; PR-validate confirmed unchanged - [ ] Non-author five-axis review - [ ] operator-config#63 landed + lane instantiated under GO - [ ] After lane live: dry-run a publish (workflow_dispatch) and confirm it lands on a publish runner, not the general pool Refs internal#394, #399, #305; pairs with operator-config#63.
hongming added 1 commit 2026-05-16 07:39:13 +00:00
ci(publish): retarget ship-the-fix workflows to the dedicated publish lane (internal#394/#399)
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 27s
cascade-list-drift-gate / check (pull_request) Successful in 28s
CI / Detect changes (pull_request) Successful in 1m17s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 29s
E2E API Smoke Test / detect-changes (pull_request) Successful in 1m59s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 1m47s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m59s
qa-review / approved (pull_request) Failing after 43s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 2m18s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 4m47s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 2m14s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 4m19s
sop-checklist / all-items-acked (pull_request) Successful in 38s
security-review / approved (pull_request) Failing after 40s
sop-tier-check / tier-check (pull_request) Successful in 37s
CI / Python Lint & Test (pull_request) Successful in 9m9s
CI / Canvas (Next.js) (pull_request) Failing after 11m5s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / all-required (pull_request) Failing after 11m10s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 33s
CI / Platform (Go) (pull_request) Failing after 21m28s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 26s
CI / Shellcheck (E2E scripts) (pull_request) Has been cancelled
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 25s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 6m29s
Secret scan / Scan diff for credential-shaped strings (pull_request) Has been cancelled
gate-check-v3 / gate-check (pull_request) Has been cancelled
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Has been cancelled
audit-force-merge / audit (pull_request) Has been skipped
813f706a71
BLOCKED — do NOT merge until the publish lane is live (see below).

All 107 molecule-core jobs use runs-on: ubuntu-latest, so the 4
ship-the-fix workflows queue in the same FIFO as every PR-CI job.
internal#399: on a fix merge, main push image build/push/deploy waits
behind the PR backlog. This retargets ONLY the post-merge ship jobs
to the reserved `publish` lane (operator-config#63):

- publish-runtime.yml: publish + cascade
- publish-canvas-image.yml: build-and-push
- publish-workspace-server-image.yml: build-and-push + deploy-production
- redeploy-tenants-on-main.yml: redeploy

Deliberately NOT changed: publish-runtime-autobump.yml's pr-validate
job (it runs on PRs — that is PR-CI, stays on the general pool). Only
push/tag/workflow_dispatch ship jobs move.

HARD SEQUENCING PRECONDITION: Gitea 1.22.6 schedules a bare
`runs-on: <label>` job onto a runner advertising that label, and if
NONE exists the job queues INDEFINITELY (the exact trap that got the
`docker` label reverted — see the inline note retained in
publish-canvas-image.yml). Therefore this PR MUST NOT merge until
≥2 `publish`-labelled runners are registered:
  1. operator-config#63 merged + propagated, AND
  2. publish-lane-ensure.sh run under explicit Hongming GO
     (ALLOW_FLEET_MUTATION=1), lane verified registered in Gitea.
Merging earlier would wedge every release + production deploy.

Validated: all 4 workflows parse as valid YAML; exactly the 6 ship
jobs retargeted; PR-validation job confirmed unchanged.

Refs internal#394, #399, #305; pairs with operator-config#63.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Member

[core-qa-agent] N/A — CI workflow only

Retargets publish workflows to a dedicated lane (internal#394/#399). Pure CI infrastructure change across 4 workflow YAML files. No platform Go code, no Python workspace, no Canvas. CI gate validates.

[core-qa-agent] N/A — CI workflow only Retargets publish workflows to a dedicated lane (internal#394/#399). Pure CI infrastructure change across 4 workflow YAML files. No platform Go code, no Python workspace, no Canvas. CI gate validates.
Member

[core-devops-agent] CI review — runs-on: publish lane changes

Reviewed all 4 workflow files in this PR:

Changes:

  • publish-canvas-image.yml, publish-runtime.yml, publish-workspace-server-image.yml, redeploy-tenants-on-main.yml — all post-merge ship jobs changed from runs-on: ubuntu-latest to runs-on: publish.

Concerns (non-blocking):

  1. Indefinite queue risk (Gitea 1.22.6): A bare runs-on: publish with no registered runners queues forever — the same trap that caused the reverted docker label (#576). The hard precondition comment in each job is correct and explicit. This PR is blocked until ≥2 publish-lane runners are registered (operator-config#63).

  2. continue-on-error: true masks lane-not-found failures: If the publish lane doesn't exist at merge time, continue-on-error: true will suppress the failure and the image won't be built/published. Consider whether this is the right Phase 3 behavior, or whether publish failures should be surfaced (fail-open vs fail-closed for production release).

  3. No change to PR-validation jobs: Confirmed that publish-runtime-autobump pr-validate and other PR-CI jobs remain on ubuntu-latest — correct per the PR description.

LGTM with the hard precondition noted. The inline comments are thorough and the rationale is clear. Merge order: (1) register ≥2 publish-lane runners, (2) verify runs-on: publish jobs run on main, (3) merge this PR.

[core-devops-agent] CI review — `runs-on: publish` lane changes Reviewed all 4 workflow files in this PR: **Changes:** - `publish-canvas-image.yml`, `publish-runtime.yml`, `publish-workspace-server-image.yml`, `redeploy-tenants-on-main.yml` — all post-merge ship jobs changed from `runs-on: ubuntu-latest` to `runs-on: publish`. **Concerns (non-blocking):** 1. **Indefinite queue risk (Gitea 1.22.6):** A bare `runs-on: publish` with no registered runners queues forever — the same trap that caused the reverted `docker` label (#576). The hard precondition comment in each job is correct and explicit. This PR is blocked until ≥2 publish-lane runners are registered (operator-config#63). 2. **`continue-on-error: true` masks lane-not-found failures:** If the publish lane doesn't exist at merge time, `continue-on-error: true` will suppress the failure and the image won't be built/published. Consider whether this is the right Phase 3 behavior, or whether publish failures should be surfaced (fail-open vs fail-closed for production release). 3. **No change to PR-validation jobs:** Confirmed that `publish-runtime-autobump pr-validate` and other PR-CI jobs remain on `ubuntu-latest` — correct per the PR description. **LGTM** with the hard precondition noted. The inline comments are thorough and the rationale is clear. Merge order: (1) register ≥2 publish-lane runners, (2) verify `runs-on: publish` jobs run on main, (3) merge this PR.
Member

[core-security-agent] N/A — CI ops. 4 workflow files retarget to dedicated 'publish' runner lane (internal#394/#399). Explicit note: 'Do NOT merge before the lane exists' (Gitea 1.22.6 bare runs-on queues indefinitely without registered runner). No production code. No security surface.

[core-security-agent] N/A — CI ops. 4 workflow files retarget to dedicated 'publish' runner lane (internal#394/#399). Explicit note: 'Do NOT merge before the lane exists' (Gitea 1.22.6 bare runs-on queues indefinitely without registered runner). No production code. No security surface.
Author
Owner

Superseded by #1376 (merged 2026-05-16T19:47:27Z, retargeted publish/deploy ship jobs to the dedicated publish lane per internal#462). Closing as duplicate; no functional gap.

Ref: https://git.moleculesai.app/molecule-ai/molecule-core/pulls/1376

Superseded by #1376 (merged 2026-05-16T19:47:27Z, retargeted publish/deploy ship jobs to the dedicated publish lane per internal#462). Closing as duplicate; no functional gap. Ref: https://git.moleculesai.app/molecule-ai/molecule-core/pulls/1376
hongming closed this pull request 2026-05-19 01:11:16 +00:00
Some required checks failed
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 27s
cascade-list-drift-gate / check (pull_request) Successful in 28s
CI / Detect changes (pull_request) Successful in 1m17s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 29s
E2E API Smoke Test / detect-changes (pull_request) Successful in 1m59s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 1m47s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m59s
qa-review / approved (pull_request) Failing after 43s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 2m18s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 4m47s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 2m14s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 4m19s
sop-checklist / all-items-acked (pull_request) Successful in 38s
security-review / approved (pull_request) Failing after 40s
sop-tier-check / tier-check (pull_request) Successful in 37s
CI / Python Lint & Test (pull_request) Successful in 9m9s
CI / Canvas (Next.js) (pull_request) Failing after 11m5s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / all-required (pull_request) Failing after 11m10s
Required
Details
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 33s
CI / Platform (Go) (pull_request) Failing after 21m28s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 26s
CI / Shellcheck (E2E scripts) (pull_request) Has been cancelled
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 25s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 6m29s
Secret scan / Scan diff for credential-shaped strings (pull_request) Has been cancelled
gate-check-v3 / gate-check (pull_request) Has been cancelled
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Has been cancelled
audit-force-merge / audit (pull_request) Has been skipped

Pull request closed

Sign in to join this conversation.
No Reviewers
4 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#1315