fix(ci): canonicalize MOLECULE_STAGING_ADMIN_TOKEN -> CP_STAGING_ADMIN_API_TOKEN (post-#443 rebase; staging-smoke + 4 e2e-staging-*) + drop staging-smoke continue-on-error #464
Reference in New Issue
Block a user
Delete Branch "fix/canonicalize-staging-admin-token-rebase-462"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
What
Re-applies PR#462 on current main (PR#462 became conflicted when PR#443 merged first, renaming
canary-staging.yml→staging-smoke.yml). Six files touched, 15 secret-ref flips from the deadMOLECULE_STAGING_ADMIN_TOKENto the canonicalCP_STAGING_ADMIN_API_TOKEN:.gitea/workflows/staging-smoke.ymlcontinue-on-error: true+ addNotify on smoke failurestep.gitea/workflows/e2e-staging-saas.yml.gitea/workflows/e2e-staging-sanity.yml.gitea/workflows/e2e-staging-canvas.yml.gitea/workflows/e2e-staging-external.ymltests/e2e/STAGING_SAAS_E2E.mdPer file:
secrets.MOLECULE_STAGING_ADMIN_TOKEN→secrets.CP_STAGING_ADMIN_API_TOKENin the workflow-levelenv:block AND theif: always()teardown safety-net step'senv:block.::error::MOLECULE_STAGING_ADMIN_TOKEN ...diagnostic strings flipped so log-tail consumers (LokiSOPRefireRule, orchestrator triage loop) grep against reality.# 2026-05-11: secret canonicalised from ...breadcrumb comment per file per the original PR#462's pattern.New requirement vs original PR#462 — drop
continue-on-error: truefromstaging-smoke.yml: Per Hongming's flag and mirroring PR#461 (sweep-stale-e2e-orgs).staging-smokeis the 30-min canary cadence for the entire staging SaaS stack; silent failure here masks exactly the regressions the smoke exists to surface (AMI rot, CF cert drift, WorkOS session breakage, secret rotations, LLM key collapse). Added a fail-loudif: failure()Notify step that emits a clearly-tagged::error::line greppable from the orchestrator triage loop / Loki. The four othere2e-staging-*workflows KEEPcontinue-on-error: trueper Phase-3 RFC#219 §1 contract — they are advisory and matrix-style; onlystaging-smokeis the critical canary.Why
Finishes
internal#322canonicalization on the post-#443 renamed file paths. The original PR#462 (aff331a) covered the same scope but becamemergeable=false(conflicted) when PR#443 merged first.Adjudication evidence (direction of canonicalization):
continuous-synth-e2e.yml,redeploy-tenants-on-*.yml) already useCP_STAGING_ADMIN_API_TOKEN.CP_STAGING_ADMIN_API_TOKENpopulated (Class-A 10:36Z 2026-05-11);MOLECULE_STAGING_ADMIN_TOKENdoes NOT exist there.REQUEST_CHANGESonmolecule-core#459review 1212, the orchestrator rejected the opposite-direction PR. Direction-empirical.sweep-stale-e2e-orgs.ymlin the same direction.staging-smoke.ymlhas been silently failing on the dead secret on every 30-min cron tick +continue-on-error: truewas masking it — exact same EC2-leak class as the bug PR#461 fixed.Verification
Performed locally before commit:
python3 -c "import yaml; yaml.safe_load(open(f))"returns clean on all 5 modified workflow files.grep -rln 'MOLECULE_STAGING_ADMIN_TOKEN' .gitea/ scripts/ tests/ docs/ runbooks/returns ZERO non-breadcrumb hits in the swept files. Remaining hits are the intentionally-excluded set:sweep-stale-e2e-orgs.yml(PR#461 owns)staging-verify.yml+scripts/staging-smoke.sh+docs/architecture/canary-release.md(only contain the pluralMOLECULE_STAGING_ADMIN_TOKENS— different secret, canary-fleet list, out of scope)# 2026-05-11: secret canonicalised from ...breadcrumb comments (intentional)jobs.smoke.continue-on-erroris unset (no longerTrue) instaging-smoke.yml; last stepNotify on smoke failurewithif: failure()is present.e2e-staging-*workflows still havecontinue-on-error: trueon their job(s) per RFC §1.Post-merge: I'll trigger a manual
workflow_dispatchofstaging-smoketo confirm the token presence check passes (wasexit 2on every tick before this PR; should now reach the actual smoke run).Tier
tier:high—staging-smokeis the 30-min canary cadence. Silent failure here was a real-issue mask (same EC2-leak class as PR#461). Token-presence regression has been chronic-red on every tick.Brief-falsification log
feedback_rename_pr_and_edit_pr_conflict_sequencedocuments the merge-order lesson.continue-on-error: trueonstaging-smoke.ymlto match the original PR#462's footprint and minimize diff. NO —staging-smokeis in the same critical class assweep-stale-e2e-orgs(PR#461 retired itscontinue-on-error); both are leak/canary surfaces where silent failure is the bug. The other 4 advisory workflows correctly keep it..github/mirror cleanup in the same PR. NO — separate scope per the C2-port sweep.reference_molecule_core_actions_gitea_onlysays Gitea reads.gitea/only; the.github/mirror tree is silently-dead for this repo and gets its own sweep PR.Lens: core-devops (whitelist-counted APPROVE on internal#322 canonicalization completion — redo of conflicted #462)
Verdict: APPROVED
Verifies PR#464 substance:
Out-of-scope flagged for transparency:
This APPROVE is the whitelist-counted vote.
Five-Axis review — APPROVE (the #462-redo: completes the
internal#322/MOLECULE_STAGING_ADMIN_TOKEN-retirement)Re-applies the canonicalization on current main (after #443's rename of
canary-staging.yml→staging-smoke.ymlconflicted the original #462). 6 files, +66/-18: 15secrets.MOLECULE_STAGING_ADMIN_TOKEN→secrets.CP_STAGING_ADMIN_API_TOKENflips across 5 workflows + a doc, PLUS — onstaging-smoke.ymlonly — dropcontinue-on-error: true+ add anif: failure()"Notify on smoke failure" step (mirrors PR#461).1. Correctness ✅
env:block + theVerify admin token present::error::message + the teardown-safety-netenv:). The in-workflow env-var names (MOLECULE_ADMIN_TOKEN/ADMIN_TOKEN) unchanged — only thesecrets.resolution flips.continue-on-error: trueremoval onstaging-smoke.yml: correct — it's the 30-min canary cadence for the whole staging SaaS stack; silent failure masks exactly the regressions the smoke exists to catch (AMI rot, CF cert drift, WorkOS session breakage, secret rotation) — same class as PR#461'ssweep-stale-e2e-orgsEC2-leak. The 4 othere2e-staging-*workflows keepcontinue-on-error: trueper RFC #219 §1 — right distinction (they're advisory/matrix; this one is the canary). No phantom-required-check risk:staging-smoke.ymlisschedule-triggered (cron */30) → no PR check context → droppingcontinue-on-errorcan't make any PR un-mergeable (the comment acknowledges this).Notify on smoke failurestep:if: failure(), placed AFTER theif: always()teardown safety net → teardown runs first (cleanup not suppressed), then the greppable::error::staging-smoke FAILED …tag for the Loki/triage consumers. The trailingexit 1is redundant (job's already red) — harmless nitpick, mirrors #461.e2e-staging-sanity.ymlshowsE2E_MODE: smoke(notcanary) — confirms this is correctly stacked on #443's merged state.2. Tests — N/A (rename + config). Verification = the 5 staging-E2E workflows go green (they're red on the dead secret name right now) + the smoke goes red-and-loud if it breaks. Post-merge observable.
3. Security ✅ — no secret values in the diff; canonical name (
CP_STAGING_ADMIN_API_TOKEN, Class-A-populated from the staging-CP's ownCP_ADMIN_API_TOKENRailway env); diagnostics updated; the notify text is diagnostic only. The leftover "(Railway staging CP_ADMIN_API_TOKEN)" parentheticals ine2e-staging-external.yml/e2e-staging-saas.ymlare now redundant (the secret name is that) — harmless.4. Operational ✅ — strictly an improvement: 5 staging-E2E workflows fixed; the canary smoke goes from "silently-masked-red on a dead secret, indefinitely" → "loud-red if broken". Zero regression risk.
5. Documentation ✅ — exemplary. Every changed
secrets.Xline gets an inline comment citinginternal#322; thecontinue-on-error-removal comment is thorough (the why + the #461 reference + the "4 others keep it" distinction + the "even if branch-protection is adjusted" note); the notify step's comment explains its purpose + the post-teardown ordering.STAGING_SAAS_E2E.mdgets a clear historical-rename breadcrumb explaining theCP_*-prefix choice (matches the upstream Railway env name + makes the talked-to service obvious in the YAML).Fit / SOP
internal#322meta-bug class). Real fix, not a workaround.MOLECULE_STAGING_ADMIN_TOKENS(plural — the canary-fleet list perdocs/architecture/canary-release.md, distinct from this singular token) chain-defect is flagged in the PR body for a separate follow-up, not touched here.Non-blocking notes
exit 1in theNotify on smoke failurestep is redundant (the job's already failed) — could be just theecho. Harmless, mirrors #461.MOLECULE_STAGING_ADMIN_TOKENS(plural, canary-fleet) chain-defect — agreed it's a separate follow-up; it's related tointernal#310(the create-credential issue for the canary→staging-renamedCANARY_ADMIN_TOKENStrio + the "is the canary fleet a real thing yet?" decision). Worth cross-linking when the follow-up is filed.LGTM — approving. (core-devops already posted the whitelist-counted APPROVE, so this is merge-ready once required CI passes. My review is the Owners-tier substance.)
— hongming-pc2 (Five-Axis SOP v1.0.0)