[main-red] molecule-ai/molecule-core: 4db64bcbc3 #628

Closed
opened 2026-05-12 01:11:51 +00:00 by gitea-actions · 2 comments

Main is RED on molecule-ai/molecule-core at 4db64bcbc3

Commit: https://git.moleculesai.app/molecule-ai/molecule-core/commit/4db64bcbc3a9cf5040452c4a961865ebac57f68f

Auto-filed by .gitea/workflows/main-red-watchdog.yml (Option C of the main-never-red directive). Per feedback_no_such_thing_as_flakes + feedback_fix_root_not_symptom: investigate the root cause; do NOT revert as a reflex. The watchdog itself never reverts.

Failed status contexts

(Combined state reported failure/error but no per-context entries were in a red state. This usually means a CI emitter set combined-status directly without a per-context status. Check the most recent workflow run for main and trace from there.)

Resolution path

  1. Read the failed logs (links above).
  2. If reproducible locally, fix forward in a PR targeting main.
  3. If the failure is a real flake — STOP. Per feedback_no_such_thing_as_flakes, intermittent failures are real bugs. Investigate to root cause; do not mark as flake.
  4. If the failure is blocking unrelated work for >1 hour, file a follow-up issue and assign someone. Do NOT revert without a human GO per feedback_prod_apply_needs_hongming_chat_go (branch protection is a prod surface).

Debug

{
  "all_contexts": [
    {
      "context": "Block internal-flavored paths / Block forbidden paths (push)",
      "state": null
    },
    {
      "context": "CI / Detect changes (push)",
      "state": null
    },
    {
      "context": "Lint curl status-code capture / Scan workflows for curl status-capture pollution (push)",
      "state": null
    },
    {
      "context": "E2E Staging Canvas (Playwright) / detect-changes (push)",
      "state": null
    },
    {
      "context": "Secret scan / Scan diff for credential-shaped strings (push)",
      "state": null
    },
    {
      "context": "Handlers Postgres Integration / detect-changes (push)",
      "state": null
    },
    {
      "context": "E2E API Smoke Test / detect-changes (push)",
      "state": null
    },
    {
      "context": "Runtime PR-Built Compatibility / detect-changes (push)",
      "state": null
    },
    {
      "context": "CI / Shellcheck (E2E scripts) (push)",
      "state": null
    },
    {
      "context": "CI / Platform (Go) (push)",
      "state": null
    },
    {
      "context": "CI / Canvas (Next.js) (push)",
      "state": null
    },
    {
      "context": "Handlers Postgres Integration / Handlers Postgres Integration (push)",
      "state": null
    },
    {
      "context": "CI / Python Lint & Test (push)",
      "state": null
    },
    {
      "context": "E2E API Smoke Test / E2E API Smoke Test (push)",
      "state": null
    },
    {
      "context": "Runtime PR-Built Compatibility / PR-built wheel + import smoke (push)",
      "state": null
    },
    {
      "context": "E2E Staging Canvas (Playwright) / Canvas tabs E2E (push)",
      "state": null
    },
    {
      "context": "Sweep stale e2e-* orgs (staging) / Sweep e2e orgs (push)",
      "state": null
    },
    {
      "context": "status-reaper / reap (push)",
      "state": null
    },
    {
      "context": "Continuous synthetic E2E (staging) / Synthetic E2E against staging (push)",
      "state": null
    },
    {
      "context": "CI / Canvas Deploy Reminder (push)",
      "state": null
    },
    {
      "context": "Staging SaaS smoke (every 30 min) / Staging SaaS smoke (push)",
      "state": null
    },
    {
      "context": "CI / all-required (push)",
      "state": null
    },
    {
      "context": "main-red-watchdog / watchdog (push)",
      "state": null
    }
  ],
  "branch": "main",
  "combined_state": "failure",
  "failed_contexts": [],
  "sha": "4db64bcbc3a9cf5040452c4a961865ebac57f68f"
}

This issue is idempotent: the watchdog runs hourly at :05 and edits this body in place. When main returns to green, the watchdog will close this issue automatically with a "main returned to green" comment.

# Main is RED on `molecule-ai/molecule-core` at `4db64bcbc3` Commit: <https://git.moleculesai.app/molecule-ai/molecule-core/commit/4db64bcbc3a9cf5040452c4a961865ebac57f68f> Auto-filed by `.gitea/workflows/main-red-watchdog.yml` (Option C of the [main-never-red directive](https://git.moleculesai.app/molecule-ai/molecule-core/issues/420)). Per `feedback_no_such_thing_as_flakes` + `feedback_fix_root_not_symptom`: investigate the root cause; do NOT revert as a reflex. The watchdog itself never reverts. ## Failed status contexts _(Combined state reported `failure`/`error` but no per-context entries were in a red state. This usually means a CI emitter set combined-status directly without a per-context status. Check the most recent workflow run for `main` and trace from there.)_ ## Resolution path 1. Read the failed logs (links above). 2. If reproducible locally, fix forward in a PR targeting `main`. 3. If the failure is a real flake — STOP. Per `feedback_no_such_thing_as_flakes`, intermittent failures are real bugs. Investigate to root cause; do not mark as flake. 4. If the failure is blocking unrelated work for >1 hour, file a follow-up issue and assign someone. Do NOT revert without a human GO per `feedback_prod_apply_needs_hongming_chat_go` (branch protection is a prod surface). ## Debug ```json { "all_contexts": [ { "context": "Block internal-flavored paths / Block forbidden paths (push)", "state": null }, { "context": "CI / Detect changes (push)", "state": null }, { "context": "Lint curl status-code capture / Scan workflows for curl status-capture pollution (push)", "state": null }, { "context": "E2E Staging Canvas (Playwright) / detect-changes (push)", "state": null }, { "context": "Secret scan / Scan diff for credential-shaped strings (push)", "state": null }, { "context": "Handlers Postgres Integration / detect-changes (push)", "state": null }, { "context": "E2E API Smoke Test / detect-changes (push)", "state": null }, { "context": "Runtime PR-Built Compatibility / detect-changes (push)", "state": null }, { "context": "CI / Shellcheck (E2E scripts) (push)", "state": null }, { "context": "CI / Platform (Go) (push)", "state": null }, { "context": "CI / Canvas (Next.js) (push)", "state": null }, { "context": "Handlers Postgres Integration / Handlers Postgres Integration (push)", "state": null }, { "context": "CI / Python Lint & Test (push)", "state": null }, { "context": "E2E API Smoke Test / E2E API Smoke Test (push)", "state": null }, { "context": "Runtime PR-Built Compatibility / PR-built wheel + import smoke (push)", "state": null }, { "context": "E2E Staging Canvas (Playwright) / Canvas tabs E2E (push)", "state": null }, { "context": "Sweep stale e2e-* orgs (staging) / Sweep e2e orgs (push)", "state": null }, { "context": "status-reaper / reap (push)", "state": null }, { "context": "Continuous synthetic E2E (staging) / Synthetic E2E against staging (push)", "state": null }, { "context": "CI / Canvas Deploy Reminder (push)", "state": null }, { "context": "Staging SaaS smoke (every 30 min) / Staging SaaS smoke (push)", "state": null }, { "context": "CI / all-required (push)", "state": null }, { "context": "main-red-watchdog / watchdog (push)", "state": null } ], "branch": "main", "combined_state": "failure", "failed_contexts": [], "sha": "4db64bcbc3a9cf5040452c4a961865ebac57f68f" } ``` _This issue is idempotent: the watchdog runs hourly at `:05` and edits this body in place. When `main` returns to green, the watchdog will close this issue automatically with a "main returned to green" comment._
gitea-actions bot added the tier:high label 2026-05-12 01:12:00 +00:00
Member

[triage-agent] Hourly triage ~02:35Z: all CI context entries at 4db64bcbc3 have state=None (status-emitter bug, not real CI failure). CI runner IS operational: PRs #618,#612,#609,#606 merged in last 2h. This is the same false-positive pattern as issues #561,#546,#484,#429. No action required.

[triage-agent] Hourly triage ~02:35Z: all CI context entries at 4db64bcbc3 have state=None (status-emitter bug, not real CI failure). CI runner IS operational: PRs #618,#612,#609,#606 merged in last 2h. This is the same false-positive pattern as issues #561,#546,#484,#429. No action required.
Owner

Stale + known op-noise — closing as dup of #561. Plus: the status-reaper lagged here (queue-under-load), which is a tunable.

The reds on 4db64bcbc3a9 when the watchdog filed this:

  • Staging SaaS smoke (every 30 min) / Staging SaaS smoke (push)
  • Sweep stale Cloudflare DNS records / Sweep CF orphans (push)
  • Continuous synthetic E2E (staging) / Synthetic E2E against staging (push)
  • ci-required-drift / drift (push)
  • gate-check-v3 / gate-check (push)

All schedule-workflow (push)-suffix reds (the Gitea-1.22.6-hardcoded-suffix class) — exactly what the status-reaper (#618 rev1) is meant to compensate, and exactly the op-noise tracked on #561. None are code regressions.

Why it wasn't compensated before the watchdog fired: the reaper-cancel-cascade is fixed (#618 dropped the broken concurrency: block), but reaper ticks still queue behind the saturated runner pool — operator load was ~95 around 01:05-01:11Z, so the reaper's */5 tick was sitting in the runner queue (the orchestrator's DB check confirmed tick 16321 was Waiting) and didn't compensate within the watchdog's window. Rev1 fixed the cancel, not the queue-lag. Load has since dropped to ~36-57 so the reaper should be catching up; 4db64bcbc3a9 is no longer HEAD (current HEAD is 05c794ef330e), so this issue is stale regardless.

ci-required-drift / drift red is separately fixed-forward by #630 (APPROVE'd — detect_drift now skips-with-diagnostic on the 403/404 from the unscoped DRIFT_BOT_TOKEN instead of reding the run). Continuous synthetic E2E / Staging SaaS smoke / Sweep CF orphans stay schedule-quirk-reds until the reaper compensates them (or until #504's "scope operational workflows off push status-reporting" lands). gate-check-v3 / gate-check (push) — if that workflow is schedule-only the reaper should compensate it too; if it has a push: trigger it's preserved (worth checking — but it's not a code regression either way).

Tunable for the recurring [main-red]-noise (this is the 6th: #546/#561/#565/#583/#613/#628, most closed as op-noise dups): the watchdog (main-red-watchdog.yml, RFC #420 Option C) should either (a) wait longer before filing (give the reaper its */5 window + slack), or (b) skip-if-the-only-reds-are-known-schedule-workflow-(push)-contexts, or (c) check for a queued/in-flight status-reaper run before filing. (b) is the cleanest. And/or: give the status-reaper a dedicated lightweight runner so it isn't stuck behind the CI-merge-churn queue. Either is a #420/#504-class follow-up — flagging here for the watchdog owner.

Closing as dup of #561 (the live main-combined-status thread) + a known reaper-lag instance.

— hongming-pc2

## Stale + known op-noise — closing as dup of #561. Plus: the status-reaper lagged here (queue-under-load), which is a tunable. The reds on `4db64bcbc3a9` when the watchdog filed this: - `Staging SaaS smoke (every 30 min) / Staging SaaS smoke (push)` - `Sweep stale Cloudflare DNS records / Sweep CF orphans (push)` - `Continuous synthetic E2E (staging) / Synthetic E2E against staging (push)` - `ci-required-drift / drift (push)` - `gate-check-v3 / gate-check (push)` All schedule-workflow `(push)`-suffix reds (the Gitea-1.22.6-hardcoded-suffix class) — exactly what the status-reaper (#618 rev1) is meant to compensate, and exactly the op-noise tracked on **#561**. None are code regressions. **Why it wasn't compensated before the watchdog fired**: the reaper-cancel-cascade is fixed (#618 dropped the broken `concurrency:` block), but reaper ticks still **queue behind the saturated runner pool** — operator load was ~95 around 01:05-01:11Z, so the reaper's `*/5` tick was sitting in the runner queue (the orchestrator's DB check confirmed tick 16321 was `Waiting`) and didn't compensate within the watchdog's window. Rev1 fixed the *cancel*, not the *queue-lag*. Load has since dropped to ~36-57 so the reaper should be catching up; `4db64bcbc3a9` is no longer HEAD (current HEAD is `05c794ef330e`), so this issue is **stale** regardless. `ci-required-drift / drift` red is separately fixed-forward by **#630** (APPROVE'd — `detect_drift` now skips-with-diagnostic on the 403/404 from the unscoped `DRIFT_BOT_TOKEN` instead of reding the run). `Continuous synthetic E2E` / `Staging SaaS smoke` / `Sweep CF orphans` stay schedule-quirk-reds until the reaper compensates them (or until #504's "scope operational workflows off `push` status-reporting" lands). `gate-check-v3 / gate-check (push)` — if that workflow is schedule-only the reaper should compensate it too; if it has a `push:` trigger it's preserved (worth checking — but it's not a code regression either way). **Tunable for the recurring `[main-red]`-noise** (this is the 6th: #546/#561/#565/#583/#613/#628, most closed as op-noise dups): the watchdog (`main-red-watchdog.yml`, RFC #420 Option C) should either (a) wait longer before filing (give the reaper its `*/5` window + slack), or (b) skip-if-the-only-reds-are-known-schedule-workflow-`(push)`-contexts, or (c) check for a queued/in-flight `status-reaper` run before filing. (b) is the cleanest. And/or: give the status-reaper a dedicated lightweight runner so it isn't stuck behind the CI-merge-churn queue. Either is a #420/#504-class follow-up — flagging here for the watchdog owner. Closing as dup of #561 (the live main-combined-status thread) + a known reaper-lag instance. — hongming-pc2
Sign in to join this conversation.
3 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#628