[main-red] molecule-ai/molecule-core: 8a572c1ef3 #613

Closed
opened 2026-05-12 00:08:57 +00:00 by gitea-actions · 1 comment

Main is RED on molecule-ai/molecule-core at 8a572c1ef3

Commit: https://git.moleculesai.app/molecule-ai/molecule-core/commit/8a572c1ef3dd9158712a2ed65467575840d79293

Auto-filed by .gitea/workflows/main-red-watchdog.yml (Option C of the main-never-red directive). Per feedback_no_such_thing_as_flakes + feedback_fix_root_not_symptom: investigate the root cause; do NOT revert as a reflex. The watchdog itself never reverts.

Failed status contexts

(Combined state reported failure/error but no per-context entries were in a red state. This usually means a CI emitter set combined-status directly without a per-context status. Check the most recent workflow run for main and trace from there.)

Resolution path

  1. Read the failed logs (links above).
  2. If reproducible locally, fix forward in a PR targeting main.
  3. If the failure is a real flake — STOP. Per feedback_no_such_thing_as_flakes, intermittent failures are real bugs. Investigate to root cause; do not mark as flake.
  4. If the failure is blocking unrelated work for >1 hour, file a follow-up issue and assign someone. Do NOT revert without a human GO per feedback_prod_apply_needs_hongming_chat_go (branch protection is a prod surface).

Debug

{
  "all_contexts": [
    {
      "context": "CI / Platform (Go) (push)",
      "state": null
    },
    {
      "context": "CI / Canvas (Next.js) (push)",
      "state": null
    },
    {
      "context": "CI / Shellcheck (E2E scripts) (push)",
      "state": null
    },
    {
      "context": "CI / Canvas Deploy Reminder (push)",
      "state": null
    },
    {
      "context": "CI / Python Lint & Test (push)",
      "state": null
    },
    {
      "context": "CI / all-required (push)",
      "state": null
    },
    {
      "context": "E2E API Smoke Test / E2E API Smoke Test (push)",
      "state": null
    },
    {
      "context": "E2E Staging Canvas (Playwright) / Canvas tabs E2E (push)",
      "state": null
    },
    {
      "context": "Handlers Postgres Integration / Handlers Postgres Integration (push)",
      "state": null
    },
    {
      "context": "publish-workspace-server-image / build-and-push (push)",
      "state": null
    },
    {
      "context": "Runtime PR-Built Compatibility / PR-built wheel + import smoke (push)",
      "state": null
    },
    {
      "context": "Block internal-flavored paths / Block forbidden paths (push)",
      "state": null
    },
    {
      "context": "Lint curl status-code capture / Scan workflows for curl status-capture pollution (push)",
      "state": null
    },
    {
      "context": "CI / Detect changes (push)",
      "state": null
    },
    {
      "context": "E2E API Smoke Test / detect-changes (push)",
      "state": null
    },
    {
      "context": "Handlers Postgres Integration / detect-changes (push)",
      "state": null
    },
    {
      "context": "status-reaper / reap (push)",
      "state": null
    },
    {
      "context": "main-red-watchdog / watchdog (push)",
      "state": null
    },
    {
      "context": "publish-canvas-image / Build & push canvas image (push)",
      "state": null
    },
    {
      "context": "Secret scan / Scan diff for credential-shaped strings (push)",
      "state": null
    },
    {
      "context": "E2E Staging Canvas (Playwright) / detect-changes (push)",
      "state": null
    },
    {
      "context": "Runtime PR-Built Compatibility / detect-changes (push)",
      "state": null
    }
  ],
  "branch": "main",
  "combined_state": "failure",
  "failed_contexts": [],
  "sha": "8a572c1ef3dd9158712a2ed65467575840d79293"
}

This issue is idempotent: the watchdog runs hourly at :05 and edits this body in place. When main returns to green, the watchdog will close this issue automatically with a "main returned to green" comment.

# Main is RED on `molecule-ai/molecule-core` at `8a572c1ef3` Commit: <https://git.moleculesai.app/molecule-ai/molecule-core/commit/8a572c1ef3dd9158712a2ed65467575840d79293> Auto-filed by `.gitea/workflows/main-red-watchdog.yml` (Option C of the [main-never-red directive](https://git.moleculesai.app/molecule-ai/molecule-core/issues/420)). Per `feedback_no_such_thing_as_flakes` + `feedback_fix_root_not_symptom`: investigate the root cause; do NOT revert as a reflex. The watchdog itself never reverts. ## Failed status contexts _(Combined state reported `failure`/`error` but no per-context entries were in a red state. This usually means a CI emitter set combined-status directly without a per-context status. Check the most recent workflow run for `main` and trace from there.)_ ## Resolution path 1. Read the failed logs (links above). 2. If reproducible locally, fix forward in a PR targeting `main`. 3. If the failure is a real flake — STOP. Per `feedback_no_such_thing_as_flakes`, intermittent failures are real bugs. Investigate to root cause; do not mark as flake. 4. If the failure is blocking unrelated work for >1 hour, file a follow-up issue and assign someone. Do NOT revert without a human GO per `feedback_prod_apply_needs_hongming_chat_go` (branch protection is a prod surface). ## Debug ```json { "all_contexts": [ { "context": "CI / Platform (Go) (push)", "state": null }, { "context": "CI / Canvas (Next.js) (push)", "state": null }, { "context": "CI / Shellcheck (E2E scripts) (push)", "state": null }, { "context": "CI / Canvas Deploy Reminder (push)", "state": null }, { "context": "CI / Python Lint & Test (push)", "state": null }, { "context": "CI / all-required (push)", "state": null }, { "context": "E2E API Smoke Test / E2E API Smoke Test (push)", "state": null }, { "context": "E2E Staging Canvas (Playwright) / Canvas tabs E2E (push)", "state": null }, { "context": "Handlers Postgres Integration / Handlers Postgres Integration (push)", "state": null }, { "context": "publish-workspace-server-image / build-and-push (push)", "state": null }, { "context": "Runtime PR-Built Compatibility / PR-built wheel + import smoke (push)", "state": null }, { "context": "Block internal-flavored paths / Block forbidden paths (push)", "state": null }, { "context": "Lint curl status-code capture / Scan workflows for curl status-capture pollution (push)", "state": null }, { "context": "CI / Detect changes (push)", "state": null }, { "context": "E2E API Smoke Test / detect-changes (push)", "state": null }, { "context": "Handlers Postgres Integration / detect-changes (push)", "state": null }, { "context": "status-reaper / reap (push)", "state": null }, { "context": "main-red-watchdog / watchdog (push)", "state": null }, { "context": "publish-canvas-image / Build & push canvas image (push)", "state": null }, { "context": "Secret scan / Scan diff for credential-shaped strings (push)", "state": null }, { "context": "E2E Staging Canvas (Playwright) / detect-changes (push)", "state": null }, { "context": "Runtime PR-Built Compatibility / detect-changes (push)", "state": null } ], "branch": "main", "combined_state": "failure", "failed_contexts": [], "sha": "8a572c1ef3dd9158712a2ed65467575840d79293" } ``` _This issue is idempotent: the watchdog runs hourly at `:05` and edits this body in place. When `main` returns to green, the watchdog will close this issue automatically with a "main returned to green" comment._
gitea-actions bot added the tier:high label 2026-05-12 00:09:05 +00:00
Owner

Known — the publish-image runner-socket coin-flip (same root as mc#576), surfaced by #606's revert. Not a code regression. Recommend close as dup of mc#576.

Per-context red on 8a572c1ef3dd (= the #606 merge): publish-canvas-image / Build & push canvas image (push).

Root cause = the same one mc#576 tracks for publish-workspace-server-image, just the canvas variant: the build-and-push job runs docker buildx build, but runs-on: ubuntu-latest doesn't pin a docker-capable runner — only ~half the act_runner pool mounts /var/run/docker.sock. Land on the wrong half → fail. Coin-flip. This was the state before #599; #599 tried runs-on: [ubuntu-latest, docker] but the docker label was never registered on any runner → jobs queued forever with zero eligible runners → strictly worse; #606 reverted to runs-on: ubuntu-latest (correct emergency move — coin-flip > queue-forever). So this red is the pre-existing coin-flip state freshly visible after #606, not a regression from #606. publish-workspace-server-image isn't in this commit's red list — it landed on a docker-having runner (the lucky 50%) this time.

Fix (same for both publish workflows, per the #606 re-apply checklist + mc#576 comment 13622):

  1. infra-sre (host SSH): register a docker label on every act_runner that mounts /var/run/docker.sock (group=docker, perms 660+). Enumerate via docker ps --filter name=molecule-runner --format '{{.Names}}', check each docker exec <runner> ls -la /var/run/docker.sock, register the label. Need ≥2 for redundancy.
  2. Then re-apply the runs-on: [ubuntu-latest, docker] (or [self-hosted, docker]) constraint on both publish-canvas-image.yml + publish-workspace-server-image.yml.

Until step 1, both publish workflows stay coin-flip → will intermittently red main's combined status → will intermittently trip this watchdog. The status-reaper (#589) correctly does not compensate these (publish-* workflows have push: triggers → real-defect signal, not the Gitea schedule-suffix quirk → preserved).

Recommendation: close this as a duplicate of mc#576 (which I've reopened and which covers the runner-socket issue for the publish workflows — understand it as covering publish-canvas-image too; the fix is identical). The watchdog (main-red-watchdog.yml per RFC #420) might want a "suppress when the only reds are known-tracked publish-image coin-flip contexts" filter so it stops re-filing these — same tuning ask as for the #504-class op-noise. Not reverting anything (feedback_fix_root_not_symptom).

— hongming-pc2

## Known — the publish-image runner-socket coin-flip (same root as mc#576), surfaced by #606's revert. Not a code regression. Recommend close as dup of mc#576. Per-context red on `8a572c1ef3dd` (= the #606 merge): `publish-canvas-image / Build & push canvas image (push)`. **Root cause** = the same one mc#576 tracks for `publish-workspace-server-image`, just the canvas variant: the `build-and-push` job runs `docker buildx build`, but `runs-on: ubuntu-latest` doesn't pin a *docker-capable* runner — only ~half the act_runner pool mounts `/var/run/docker.sock`. Land on the wrong half → fail. **Coin-flip.** This was the state *before* #599; #599 tried `runs-on: [ubuntu-latest, docker]` but the `docker` label was never registered on any runner → jobs queued forever with zero eligible runners → strictly worse; **#606 reverted to `runs-on: ubuntu-latest`** (correct emergency move — coin-flip > queue-forever). So this red is the *pre-existing* coin-flip state freshly visible after #606, not a regression from #606. `publish-workspace-server-image` isn't in this commit's red list — it landed on a docker-having runner (the lucky 50%) this time. **Fix** (same for both publish workflows, per the #606 re-apply checklist + mc#576 comment 13622): 1. **infra-sre** (host SSH): register a `docker` label on every act_runner that mounts `/var/run/docker.sock` (group=`docker`, perms 660+). Enumerate via `docker ps --filter name=molecule-runner --format '{{.Names}}'`, check each `docker exec <runner> ls -la /var/run/docker.sock`, register the label. Need ≥2 for redundancy. 2. **Then** re-apply the `runs-on: [ubuntu-latest, docker]` (or `[self-hosted, docker]`) constraint on both `publish-canvas-image.yml` + `publish-workspace-server-image.yml`. Until step 1, both publish workflows stay coin-flip → will intermittently red main's combined status → will intermittently trip this watchdog. The status-reaper (#589) correctly **does not** compensate these (`publish-*` workflows have `push:` triggers → real-defect signal, not the Gitea schedule-suffix quirk → preserved). **Recommendation**: close this as a duplicate of **mc#576** (which I've reopened and which covers the runner-socket issue for the publish workflows — understand it as covering `publish-canvas-image` too; the fix is identical). The watchdog (`main-red-watchdog.yml` per RFC #420) might want a "suppress when the only reds are known-tracked publish-image coin-flip contexts" filter so it stops re-filing these — same tuning ask as for the #504-class op-noise. Not reverting anything (`feedback_fix_root_not_symptom`). — hongming-pc2
Sign in to join this conversation.
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#613