Canary failing: staging SaaS smoke #424

Closed
opened 2026-05-11 07:38:07 +00:00 by gitea-actions · 95 comments

Canary run failed at 2026-05-11T07:38:06Z.

Run: https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/9377

This issue auto-closes on the next green canary run. Consecutive failures add a comment here rather than a new issue.

Canary run failed at 2026-05-11T07:38:06Z. Run: https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/9377 This issue auto-closes on the next green canary run. Consecutive failures add a comment here rather than a new issue.
Canary still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/9571
triage-operator added the tier:low label 2026-05-11 08:35:00 +00:00
Canary still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/9699
Canary still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/9878
Canary still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/10056
Canary still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/10288
Canary still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/10474
Canary still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/10704
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/10856
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/11115
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/11341
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/11498
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/11629
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/11796
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/11930
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/12060
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/12246
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/12479
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/12579
Member

core-devops-agent triage (2026-05-11T17:15Z)

Likely root cause: missing/rotated LLM API key secret

The canary (staging-smoke.yml) fails at the "Verify LLM key present" step with exit 2 when neither MOLECULE_STAGING_MINIMAX_API_KEY nor MOLECULE_STAGING_ANTHROPIC_API_KEY is populated in the Gitea Actions secret store.

Relevant from the workflow source (e2e-staging-saas.yml lines 96/101):

E2E_MINIMAX_API_KEY: ${{ secrets.MOLECULE_STAGING_MINIMAX_API_KEY }}
E2E_ANTHROPIC_API_KEY: ${{ secrets.MOLECULE_STAGING_ANTHROPIC_API_KEY }}

The staging-admin token was canonicalised to CP_STAGING_ADMIN_API_TOKEN per internal#322, but the LLM API key secrets were NOT migrated to the CP_STAGING_ prefix. This means:

Option A (preferred): Register MOLECULE_STAGING_MINIMAX_API_KEY in the Gitea Actions secret store (set the actual MiniMax API key value). This is the primary LLM path per the workflow comment.

Option B: Register MOLECULE_STAGING_ANTHROPIC_API_KEY with a direct Anthropic key as fallback.

Either secret will unblock the smoke. Both will make the canary green.

What this is NOT

  • Not a config drift in the test script — the hard-fail is intentional (lesson from #2578 where a silent fallback masked a dead OpenAI key for 36h).
  • Not a code regression — the workflow has not changed since the RFC #219 migration.
  • Not a Gitea Actions parser issue (those are fixed by PR #516).

Action needed

A human with Gitea Actions secret store access (repo admin) must register MOLECULE_STAGING_MINIMAX_API_KEY or MOLECULE_STAGING_ANTHROPIC_API_KEY. This is a human gate, not a code change.

Filed by core-devops-agent.

## core-devops-agent triage (2026-05-11T17:15Z) ### Likely root cause: missing/rotated LLM API key secret The canary (staging-smoke.yml) fails at the "Verify LLM key present" step with exit 2 when neither `MOLECULE_STAGING_MINIMAX_API_KEY` nor `MOLECULE_STAGING_ANTHROPIC_API_KEY` is populated in the Gitea Actions secret store. Relevant from the workflow source (e2e-staging-saas.yml lines 96/101): E2E_MINIMAX_API_KEY: ${{ secrets.MOLECULE_STAGING_MINIMAX_API_KEY }} E2E_ANTHROPIC_API_KEY: ${{ secrets.MOLECULE_STAGING_ANTHROPIC_API_KEY }} The staging-admin token was canonicalised to `CP_STAGING_ADMIN_API_TOKEN` per internal#322, but the LLM API key secrets were NOT migrated to the `CP_STAGING_` prefix. This means: **Option A** (preferred): Register `MOLECULE_STAGING_MINIMAX_API_KEY` in the Gitea Actions secret store (set the actual MiniMax API key value). This is the primary LLM path per the workflow comment. **Option B**: Register `MOLECULE_STAGING_ANTHROPIC_API_KEY` with a direct Anthropic key as fallback. Either secret will unblock the smoke. Both will make the canary green. ### What this is NOT - Not a config drift in the test script — the hard-fail is intentional (lesson from #2578 where a silent fallback masked a dead OpenAI key for 36h). - Not a code regression — the workflow has not changed since the RFC #219 migration. - Not a Gitea Actions parser issue (those are fixed by PR #516). ### Action needed A human with Gitea Actions secret store access (repo admin) must register `MOLECULE_STAGING_MINIMAX_API_KEY` or `MOLECULE_STAGING_ANTHROPIC_API_KEY`. This is a human gate, not a code change. Filed by core-devops-agent.
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/12705
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/12866
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/13045
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/13232
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/13721
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/13989
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/14181
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/14377
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/14526
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/14750
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/15057
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/15234
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/15561
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/15864
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/16142
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/16322
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/16533
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/16765
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/16943
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/17162
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/17453
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/17767
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/17851
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/18089
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/18304
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/18587
Member

core-qa: root-cause located and fix PR filed.

Root cause: MoleculeTenantEICRole-staging IAM role lacks ec2-instance-connect:OpenTunnel permission. The AWS-managed EC2InstanceConnect policy attached to this role grants only DescribeInstances + SendSSHPublicKey, not OpenTunnel. The tenant workspace-server's aws ec2-instance-connect open-tunnel subprocess (workspace-server/internal/handlers/terminal.go:openTunnelCmd) exits AccessDenied, so terminal_diagnose step 5 (wait-for-port) times out — visible on every smoke run as step 7b/11 failure.

Verified via aws iam simulate-principal-policyOpenTunnel: implicitDeny for this role.

Fix: molecule-ai/internal#353 — adds an inline Allow ec2-instance-connect:OpenTunnel statement to the existing MoleculeTenantSecretsAccess policy in infra/cloudformation/staging-account-bootstrap.yaml.

Post-merge: requires aws cloudformation deploy on molecule-staging-bootstrap stack (operator with AWS admin in 004947743811). After stack update, next */30 smoke run should go green and auto-close this issue.

Follow-up: tests/e2e/test_staging_full_saas.sh:514 drops the diagnose handler's Detail field — surfacing only error masked the AccessDenied cause as a generic timeout. To be filed as a separate observability nit.

core-qa: root-cause located and fix PR filed. **Root cause**: `MoleculeTenantEICRole-staging` IAM role lacks `ec2-instance-connect:OpenTunnel` permission. The AWS-managed `EC2InstanceConnect` policy attached to this role grants only `DescribeInstances` + `SendSSHPublicKey`, not `OpenTunnel`. The tenant workspace-server's `aws ec2-instance-connect open-tunnel` subprocess (`workspace-server/internal/handlers/terminal.go:openTunnelCmd`) exits AccessDenied, so `terminal_diagnose` step 5 (`wait-for-port`) times out — visible on every smoke run as step 7b/11 failure. Verified via `aws iam simulate-principal-policy` → `OpenTunnel: implicitDeny` for this role. **Fix**: molecule-ai/internal#353 — adds an inline `Allow ec2-instance-connect:OpenTunnel` statement to the existing `MoleculeTenantSecretsAccess` policy in `infra/cloudformation/staging-account-bootstrap.yaml`. **Post-merge**: requires `aws cloudformation deploy` on `molecule-staging-bootstrap` stack (operator with AWS admin in 004947743811). After stack update, next */30 smoke run should go green and auto-close this issue. **Follow-up**: `tests/e2e/test_staging_full_saas.sh:514` drops the diagnose handler's `Detail` field — surfacing only `error` masked the AccessDenied cause as a generic timeout. To be filed as a separate observability nit.
Member

core-qa: Phase 4 status — fix merged-eligible but blocked at two-eyes gate.

CI on internal#353:

  • scripts-lint: green
  • sop-tier-check: red as expected (tier=tier:high no_approvers eligible_teams=ceo)

Per SOP, tier:high requires CEO approval. core-qa persona is not eligible. Awaiting Hongming (CEO) review on molecule-ai/internal#353.

Next steps after CEO approval + merge:

  1. Operator with AWS admin in 004947743811 runs aws cloudformation deploy --stack-name molecule-staging-bootstrap --template-file infra/cloudformation/staging-account-bootstrap.yaml --capabilities CAPABILITY_NAMED_IAM (core-qa lacks IAM write).
  2. Re-run aws iam simulate-principal-policy --policy-source-arn arn:aws:iam::004947743811:role/MoleculeTenantEICRole-staging --action-names ec2-instance-connect:OpenTunnel → expect allowed.
  3. Operator-dispatch the smoke workflow (or wait for next */30 cron) → expect step 7b green, full smoke green, auto-close on this issue.

Verification of root cause (read-only) is complete. The IAM simulator confirmed OpenTunnel: implicitDeny for MoleculeTenantEICRole-staging; no other infra change is needed.

core-qa: Phase 4 status — fix merged-eligible but blocked at two-eyes gate. **CI on internal#353**: - scripts-lint: green - sop-tier-check: red as expected (`tier=tier:high no_approvers eligible_teams=ceo`) Per SOP, tier:high requires CEO approval. core-qa persona is not eligible. **Awaiting Hongming (CEO) review on molecule-ai/internal#353.** Next steps after CEO approval + merge: 1. Operator with AWS admin in 004947743811 runs `aws cloudformation deploy --stack-name molecule-staging-bootstrap --template-file infra/cloudformation/staging-account-bootstrap.yaml --capabilities CAPABILITY_NAMED_IAM` (core-qa lacks IAM write). 2. Re-run `aws iam simulate-principal-policy --policy-source-arn arn:aws:iam::004947743811:role/MoleculeTenantEICRole-staging --action-names ec2-instance-connect:OpenTunnel` → expect `allowed`. 3. Operator-dispatch the smoke workflow (or wait for next */30 cron) → expect step 7b green, full smoke green, auto-close on this issue. Verification of root cause (read-only) is complete. The IAM simulator confirmed `OpenTunnel: implicitDeny` for `MoleculeTenantEICRole-staging`; no other infra change is needed.
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/18842
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/18944
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/19363
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/19529
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/19744
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/19882
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/20248
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/20492
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/20687
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/20846
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/21024
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/21224
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/21307
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/21415
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/21536
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/21663
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/21809
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/22090
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/22182
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/22355
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/22585
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/22761
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/22909
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/23112
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/23290
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/23328
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/23512
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/23548
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/23676
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/24468
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/25057
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/25279
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/25346
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/25498
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/25648
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/25835
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/26182
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/26360
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/26470
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/26642
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/26819
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/26967
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/27110
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/27349
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/27939
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/28200
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/28445

Smoke recovered at 2026-05-13T07:07:07Z. Closing.

Smoke recovered at 2026-05-13T07:07:07Z. Closing.
gitea-actions bot closed this issue 2026-05-13 07:07:09 +00:00
Sign in to join this conversation.
3 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#424