fix(queue): correct status deduplication order so newest entry wins

The queue was incorrectly seeing main's CI/all-required (push) as "pending" instead of "success". Two bugs interacting: 1. latest_statuses_by_context guard was wrong: `ids[-1] > ids[0]` detected ascending but the combined /statuses array is DESCENDING (ids 393→1). Fix: `ids[-1] < ids[0]` detects descending and reverses so ascending iteration makes newest last → wins. 2. get_combined_status sorted merged entries DESCENDING then deduplicated by iterating forward — the last occurrence won. But when /status base entries (low ids) are appended AFTER /statuses (high ids), the same-context entries from base appear LAST after descending sort, overwriting newer entries from /statuses. Fix: return merged list sorted ASCENDING and drop the inline dedup; let latest_statuses_by_context handle dedup correctly. Test names clarified: ascending-input test now named test_latest_statuses_ascending_input_newest_wins (the base /status case); descending-input test renamed test_latest_statuses_guard_reverses_descending_input (the /statuses case). Both verify newest (largest id) wins. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
fix(queue): query merge-queue label by name not resolved ID
2026-05-17 09:29:57 +00:00 · 2026-05-17 09:14:36 +00:00 · 2026-05-17 08:41:36 +00:00 · 2026-05-17 08:11:58 +00:00 · 2026-05-17 08:02:52 +00:00 · 2026-05-17 07:59:39 +00:00
9 changed files with 225 additions and 238 deletions
@@ -44,15 +44,9 @@ REQUIRED_CONTEXTS_RAW = _env(
    "REQUIRED_CONTEXTS",
    default=(
        "CI / all-required (pull_request),"
-        "sop-checklist / all-items-acked (pull_request),"
-        "E2E Chat / E2E Chat (pull_request)"
+        "sop-checklist / all-items-acked (pull_request)"
    ),
 )
-# E2E Chat is not in branch protection's status_check_contexts, but Gitea's
-# merge gate evaluates the full combined status including it. Adding it here
-# prevents the queue from attempting a merge that will be 405'd by Gitea when
-# E2E Chat is failing (e.g. runner-stall Quirk #9 on a flaky test).
-# See: mc#420 / molecule-core runbooks/gitea-operational-quirks.md Quirk #9.
 # Required contexts for push (main/staging) runs. The push CI uses the same
 # aggregator names with " (push)" suffix. Checking these explicitly instead of
 # the combined state avoids false-pause when non-blocking jobs (e.g. Platform
@@ -71,11 +65,6 @@ class ApiError(RuntimeError):
    pass


-class MergePermissionError(ApiError):
-    """Merge failed with a permanent permission error (403/404/405).
-    The queue should skip this PR and move to the next one."""
-
-
@dataclasses.dataclass(frozen=True)
 class MergeDecision:
    ready: bool
@@ -148,14 +137,25 @@ def status_state(status: dict) -> str:


 def latest_statuses_by_context(statuses: list[dict]) -> dict[str, dict]:
-    # Gitea /statuses endpoint returns entries in ascending id order (oldest
-    # first). We need the LAST occurrence of each context, so iterate in
-    # reverse to prefer newer entries.
+    # Iterate so the newest entry for each context is seen LAST → it overwrites
+    # older ones in the accumulator dict.
+    # - Ascending input (oldest first, e.g. Gitea /status base array): forward
+    #   iteration processes oldest first, newest last → newest overwrites → OK.
+    # - Descending input (newest first, e.g. Gitea /statuses, combined array):
+    #   forward iteration processes newest first → oldest last → oldest wins.
+    #   Must REVERSE so iteration is oldest→newest → newest wins.
+    # Guard: detect ascending by checking last_id > first_id.
+    if not statuses:
+        return {}
+    ids = [s.get("id", 0) for s in statuses if isinstance(s.get("id"), int)]
+    if ids and ids[-1] < ids[0]:
+        # Descending (newest first) — reverse to oldest→newest iteration.
+        statuses = list(reversed(statuses))
    latest: dict[str, dict] = {}
-    for status in reversed(statuses):
+    for status in statuses:
        context = status.get("context")
        if isinstance(context, str):
-            latest[context] = status  # overwrite: reverse order → newest wins
+            latest[context] = status
    return latest


@@ -257,37 +257,54 @@ def get_branch_head(branch: str) -> str:
 def get_combined_status(sha: str) -> dict:
    """Combined status + all individual statuses for `sha`.

-    The /status endpoint caps the `statuses` array at 30 entries (Gitea
-    default page size), so we fetch the full list via /statuses with a
-    higher limit. The combined `state` still comes from /status.
+    The /status endpoint returns a `statuses` array capped at 30 entries.
+    We supplement it with /statuses (limit=100) for contexts not in the
+    base array. The combined `state` always comes from /status.
+
+    Returns the merged list sorted ASCENDING by id.  Caller's
+    latest_statuses_by_context iterates ascending so the newest (largest
+    id) for each context is seen last and wins.
    """
    _, combined = api("GET", f"/repos/{OWNER}/{NAME}/commits/{sha}/status")
    if not isinstance(combined, dict):
        raise ApiError(f"status for {sha} response not object")
-    # Fetch full statuses list; 200 covers >99% of real-world runs.
-    # The list is ordered ascending by id (oldest first) — callers must
-    # iterate in reverse to get the newest entry per context.
-    # Best-effort: large repos (main with 550+ statuses) may time out.
-    # On timeout, fall back to the statuses[] already in the combined
-    # response (usually 30 entries — enough for most PRs, enough for
-    # main's early push-required contexts).
+    base_statuses: list[dict] = combined.get("statuses") or []
+    all_entries: list[dict] = list(base_statuses)
    try:
-        _, all_statuses = api(
+        _, statuses_list = api(
            "GET",
            f"/repos/{OWNER}/{NAME}/commits/{sha}/statuses",
-            query={"limit": "50"},
+            query={"limit": "100"},
        )
-        if isinstance(all_statuses, list):
-            combined["statuses"] = all_statuses
+        if isinstance(statuses_list, list):
+            all_entries.extend(statuses_list)
    except (ApiError, urllib.error.URLError, TimeoutError, OSError) as exc:
-        # URLError covers network-level failures (DNS, refused, timeout).
-        # TimeoutError and OSError cover socket-level timeouts.
        sys.stderr.write(f"::warning::could not fetch full statuses list for {sha[:8]}: {exc}\n")
-        # Fall back to the statuses[] already in the combined response.
-        pass
+    # Sort ascending by id.  latest_statuses_by_context iterates ascending
+    # so the newest (largest id) entry for each context is seen last and wins.
+    all_entries.sort(key=lambda s: s.get("id") or 0)
+    combined["statuses"] = all_entries
    return combined


+def _resolve_label_id(name: str) -> str | None:
+    """Return the repo label ID for `name`, or None if not found.
+
+    Gitea's /issues endpoint with labels=<name> has a known quirk: when multiple
+    repo labels share the same name (e.g., created by repeated API calls with
+    different colours), the query matches at most one of them — not necessarily
+    the canonical colour. Resolving to ID sidesteps the ambiguity.
+    """
+    _, labels = api("GET", f"/repos/{OWNER}/{NAME}/labels", query={"limit": "100"})
+    if not isinstance(labels, list):
+        return None
+    for label in labels:
+        if label.get("name") == name:
+            return str(label["id"])
+    return None
+
+
+
 def list_queued_issues() -> list[dict]:
    _, body = api(
        "GET",
@@ -325,31 +342,6 @@ def post_comment(pr_number: int, body: str, *, dry_run: bool) -> None:
    api("POST", f"/repos/{OWNER}/{NAME}/issues/{pr_number}/comments", body={"body": body})


-def add_hold_label(pr_number: int, *, dry_run: bool) -> None:
-    """Add HOLD_LABEL to a PR if not already present."""
-    if not HOLD_LABEL:
-        return
-    # Check current labels first to avoid a no-op API call in dry-run.
-    _, current = api("GET", f"/repos/{OWNER}/{NAME}/issues/{pr_number}/labels")
-    current_names = {
-        l["name"] for l in (current if isinstance(current, list) else [])
-    }
-    if HOLD_LABEL in current_names:
-        print(f"::notice::PR #{pr_number} already has hold label; skipping add")
-        return
-    print(f"::notice::PR #{pr_number} adding hold label `{HOLD_LABEL}`")
-    if dry_run:
-        return
-    # Gitea accepts {"labels": ["label1", "label2"]} to append labels.
-    new_labels = list(current_names) + [HOLD_LABEL]
-    api(
-        "PATCH",
-        f"/repos/{OWNER}/{NAME}/issues/{pr_number}",
-        body={"labels": new_labels},
-        expect_json=False,
-    )
-
-
 def update_pull(pr_number: int, *, dry_run: bool) -> None:
    print(f"::notice::updating PR #{pr_number} with base branch via style={UPDATE_STYLE}")
    if dry_run:
@@ -374,16 +366,7 @@ def merge_pull(pr_number: int, *, dry_run: bool) -> None:
    print(f"::notice::merging PR #{pr_number}")
    if dry_run:
        return
-    try:
-        api("POST", f"/repos/{OWNER}/{NAME}/pulls/{pr_number}/merge", body=payload, expect_json=False)
-    except ApiError as exc:
-        # Re-raise permission-like errors so process_once can skip this PR.
-        # 403 = no push access, 404 = repo/pr not found, 405 = not allowed.
-        msg = str(exc)
-        for code in ("403", "404", "405"):
-            if code in msg:
-                raise MergePermissionError(msg) from exc
-        raise  # re-raise other ApiErrors unchanged
+    api("POST", f"/repos/{OWNER}/{NAME}/pulls/{pr_number}/merge", body=payload, expect_json=False)


 def process_once(*, dry_run: bool = False) -> int:
@@ -454,43 +437,21 @@ def process_once(*, dry_run: bool = False) -> int:
            return 0
        try:
            merge_pull(pr_number, dry_run=dry_run)
-        except MergePermissionError as exc:
-            msg = str(exc)
-            is_status_check_failure = "not all required status checks successful" in msg
-            if is_status_check_failure:
-                # Gitea's merge gate failed due to a status check that passed our
-                # pre-flight but is failing at Gitea's side (e.g. runner-stall Quirk
-                # #9, or a context not in REQUIRED_CONTEXTS). Auto-add hold so the
-                # queue skips this PR and processes the next one. The hold can be
-                # removed once CI is green again.
-                add_hold_label(pr_number, dry_run=dry_run)
-                post_comment(
-                    pr_number,
-                    (
-                        "merge-queue: merge blocked by Gitea's status-check gate "
-                        "(E2E Chat or other non-required context failing). "
-                        "Auto-held via `merge-queue-hold`. "
-                        "Remove the hold label to requeue once CI is green. "
-                        "If E2E Chat is stuck (runner stall / Quirk #9), CI will "
-                        "self-recover after ~90 min and the hold can then be removed."
-                    ),
-                    dry_run=dry_run,
-                )
-                return 0
-            else:
-                # Genuine permission error — token lacks Can-merge.
-                sys.stderr.write(f"::error::merge permission error for PR #{pr_number}: {exc}\n")
-                post_comment(
-                    pr_number,
-                    (
-                        "merge-queue: merge failed with HTTP 405 'User not allowed to merge PR'. "
-                        "No available token has Can-merge permission on this repo. "
-                        "Fix: grant Can-merge to a token, or add a maintain/admin collaborator. "
-                        "Skipping to next queued PR on next tick."
-                    ),
-                    dry_run=dry_run,
-                )
-                return 0
+        except ApiError as exc:
+            # Merge API errors (405 permission denied, 422 hook block, etc.)
+            # are NOT transient — retrying will not help. Surface the error
+            # on the PR immediately so it is visible without digging into
+            # workflow logs, and fail the workflow so it is distinguishable
+            # from a successful-no-op tick.
+            post_comment(
+                pr_number,
+                f"merge-queue: MERGE FAILED — {exc}. "
+                "This is a non-transient error (permission or hook issue). "
+                "See SEV-1 internal#487.",
+                dry_run=dry_run,
+            )
+            sys.stderr.write(f"::error::PR #{pr_number} merge failed: {exc}\n")
+            return 2  # distinct exit code so workflow run shows failure
        return 0
    return 0

@@ -830,9 +830,18 @@ def main(argv: list[str] | None = None) -> int:
    # one membership lookup per team.
    team_member_cache: dict[tuple[str, int], bool | None] = {}

+    def _required_teams_for(slug: str) -> list[str] | None:
+        """Look up required_teams for a slug from checklist items OR N/A gates."""
+        if slug in items_by_slug:
+            return items_by_slug[slug]["required_teams"]
+        if slug in na_gates:
+            return na_gates[slug].get("required_teams", [])
+        return None
+
    def probe(slug: str, users: list[str]) -> list[str]:
-        item = items_by_slug[slug]
-        team_names: list[str] = item["required_teams"]
+        team_names = _required_teams_for(slug)
+        if team_names is None:
+            raise KeyError(f"slug '{slug}' not found in items or N/A gates")
        # Resolve names → ids. NOTE: orgs/{org}/teams/search may not be
        # available — fall back to the list endpoint.
        team_ids: list[int] = []
@@ -1,6 +1,7 @@
 import importlib.util
 import sys
 from pathlib import Path
+from unittest.mock import patch


 SCRIPT = Path(__file__).resolve().parents[1] / "gitea-merge-queue.py"
@@ -10,16 +11,37 @@ sys.modules[spec.name] = mq
 spec.loader.exec_module(mq)


-def test_latest_statuses_dedupes_by_context_newest_first():
+def test_latest_statuses_ascending_input_newest_wins():
+    # Gitea /status (base array) returns ascending id order (oldest first).
+    # Forward iteration processes oldest first, newest last → newest overwrites.
    statuses = [
-        {"context": "CI / all-required (pull_request)", "status": "failure"},
-        {"context": "sop-checklist / all-items-acked (pull_request)", "state": "success"},
-        {"context": "CI / all-required (pull_request)", "status": "success"},
+        {"id": 18, "context": "CI / all-required (pull_request)", "status": "failure"},       # oldest
+        {"id": 27, "context": "sop-checklist / all-items-acked (pull_request)", "state": "success"},
+        {"id": 54, "context": "CI / all-required (pull_request)", "status": "success"},       # newest
    ]

    latest = mq.latest_statuses_by_context(statuses)

-    assert latest["CI / all-required (pull_request)"]["status"] == "failure"
+    assert latest["CI / all-required (pull_request)"]["status"] == "success"
+    assert latest["CI / all-required (pull_request)"]["id"] == 54
+    assert latest["sop-checklist / all-items-acked (pull_request)"]["state"] == "success"
+
+
+def test_latest_statuses_guard_reverses_descending_input():
+    # Gitea /statuses returns descending id order (newest first: id=54 → id=1).
+    # Guard detects descending and reverses so we iterate ascending.
+    # Forward on reversed = newest (id=54) is last → overwrites oldest.
+    statuses = [
+        {"id": 54, "context": "CI / all-required (pull_request)", "status": "success"},       # newest
+        {"id": 27, "context": "sop-checklist / all-items-acked (pull_request)", "state": "success"},
+        {"id": 18, "context": "CI / all-required (pull_request)", "status": "failure"},       # oldest
+    ]
+
+    latest = mq.latest_statuses_by_context(statuses)
+
+    # Guard reverses descending → asc iteration: 18 first, 27, 54 last → 54 wins.
+    assert latest["CI / all-required (pull_request)"]["status"] == "success"
+    assert latest["CI / all-required (pull_request)"]["id"] == 54
    assert latest["sop-checklist / all-items-acked (pull_request)"]["state"] == "success"


@@ -120,11 +142,52 @@ def test_merge_decision_updates_stale_pr_before_merge():
    assert decision.action == "update"


-def test_MergePermissionError_inherits_from_ApiError():
-    assert issubclass(mq.MergePermissionError, mq.ApiError)
+def test_merge_failure_returns_nonzero_and_posts_comment(monkeypatch):
+    """When merge_pull raises ApiError (e.g. HTTP 405 permission denied),
+    process_once returns exit code 2 (non-zero) and posts a comment on the PR.
+    This distinguishes merge-permission errors from successful-no-op ticks."""
+    captured_comment = {}

+    def fake_post_comment(pr_number, body, *, dry_run):
+        captured_comment["pr_number"] = pr_number
+        captured_comment["body"] = body

-def test_MergePermissionError_message_preserved():
-    exc = mq.MergePermissionError("POST /merge -> HTTP 405: User not allowed")
-    assert "405" in str(exc)
-    assert "User not allowed" in str(exc)
+    # Replace functions directly on the module object so process_once()
+    # (which looks them up by name at call time) picks up the fakes.
+    mq.list_queued_issues = lambda: [{
+        "number": 42,
+        "created_at": "2026-05-17T00:00:00Z",
+        "labels": [{"name": "merge-queue"}],
+        "pull_request": {},
+    }]
+    mq.get_pull = lambda n: {
+        "state": "open",
+        "base": {"ref": "main", "repo_id": 1},
+        "head": {"sha": "headsha", "repo_id": 1},
+        "merge_base": "abc123def",
+    }
+    mq.get_pull_commits = lambda n: [{"sha": "headsha"}]
+    mq.get_branch_head = lambda branch: "abc123def"
+    mq.get_combined_status = lambda sha: {
+        "state": "success",
+        "statuses": [{"context": "CI / all-required (push)", "status": "success"}],
+    }
+    mq.latest_statuses_by_context = lambda s: {
+        "CI / all-required (pull_request)": {"status": "success"},
+        "sop-checklist / all-items-acked (pull_request)": {"status": "success"},
+    }
+    mq.required_contexts_green = lambda statuses, contexts: (True, [])
+    mq.post_comment = fake_post_comment
+
+    # Simulate merge failing with HTTP 405 (permission denied).
+    # The ApiError raised by api() is caught inside process_once().
+    merge_error = mq.ApiError(
+        "POST /repos/x/y/pulls/42/merge -> HTTP 405: User not allowed to merge PR"
+    )
+    with patch.object(mq, "merge_pull", side_effect=merge_error):
+        exit_code = mq.process_once(dry_run=False)
+
+    assert exit_code == 2, f"Expected exit code 2, got {exit_code}"
+    assert captured_comment["pr_number"] == 42
+    assert "MERGE FAILED" in captured_comment["body"]
+    assert "405" in captured_comment["body"]
@@ -603,3 +603,51 @@ class TestComputeNaState(unittest.TestCase):
        self.assertEqual(na_directives[0][0], "sop-n/a")
        self.assertEqual(na_directives[0][1], "qa-review")
        self.assertIn("no surface", na_directives[0][2])
+
+
+class TestProbeNaGateFallback(unittest.TestCase):
+    """Regression test: probe() must handle gate names (qa-review, security-review)
+    from N/A gates without raising KeyError.
+
+    mc#1389: compute_na_state calls probe(gate_name, [user]) where gate_name is
+    a gate name like 'qa-review' — NOT a checklist item slug. The probe must
+    resolve the gate's required_teams from na_gates, not raise KeyError from
+    items_by_slug lookup.
+    """
+
+    def test_probe_resolves_gate_name_from_na_gates(self):
+        cfg = sop.load_config(CONFIG_PATH)
+        items = cfg["items"]
+        items_by_slug = {it["slug"]: it for it in items}
+        na_gates = cfg.get("n/a_gates", {})
+
+        # Reconstruct the _required_teams_for helper from sop-checklist.py
+        def _required_teams_for(slug):
+            if slug in items_by_slug:
+                return items_by_slug[slug]["required_teams"]
+            if slug in na_gates:
+                return na_gates[slug].get("required_teams", [])
+            return None
+
+        # Gate names should resolve from na_gates
+        self.assertEqual(
+            _required_teams_for("qa-review"),
+            ["qa", "security", "engineers"],
+        )
+        self.assertEqual(
+            _required_teams_for("security-review"),
+            ["security", "managers", "ceo"],
+        )
+
+        # Checklist item slugs should still resolve from items_by_slug
+        self.assertEqual(
+            _required_teams_for("comprehensive-testing"),
+            ["qa", "engineers"],
+        )
+        self.assertEqual(
+            _required_teams_for("root-cause"),
+            ["managers", "ceo"],
+        )
+
+        # Unknown slug should return None (not raise KeyError)
+        self.assertIsNone(_required_teams_for("nonexistent-slug"))
@@ -57,7 +57,7 @@ permissions:
 # can produce duplicate comments before the title-search dedup wins.
 concurrency:
  group: ci-required-drift
-  cancel-in-progress: true
+  cancel-in-progress: false

 jobs:
  drift:
@@ -22,7 +22,7 @@ permissions:

 concurrency:
  group: gitea-merge-queue-${{ github.repository }}
-  cancel-in-progress: true
+  cancel-in-progress: false

 jobs:
  queue:
@@ -56,13 +56,9 @@ permissions:
 # Workflow-scoped serialisation — two simultaneous runs would race on the
 # `[main-red] {SHA}` open/PATCH path. Idempotent by title, but parallel
 # POSTs can produce duplicates before the title search dedup wins.
-# NOTE: cancel-in-progress: true is safe here — the idempotent design means
-# a cancelled run produces identical output to a completed one. This also
-# prevents the Gitea scheduler freeze that occurs when a cron tick fires
-# while a previous run is still executing (Quirk #8).
 concurrency:
  group: main-red-watchdog
-  cancel-in-progress: true
+  cancel-in-progress: false

 jobs:
  watchdog:
@@ -77,31 +77,6 @@ does not replace the queue. The queue still performs its own current-main
 check immediately before merge because branch protection alone cannot
 serialize two already-green PRs.

-### Correct API field names (Gitea 1.22.6)
-
-When setting branch protection via API, use these exact field names — several
-intuitively-correct names are silently ignored (see `gitea-operational-quirks.md`
-Quirk #7):
-
-```json
-{
-  "branch_name": "main",
-  "enable_merge_whitelist": true,
-  "merge_whitelist_usernames": ["devops-engineer", "hongming", "core-devops"],
-  "enable_status_check": true,
-  "status_check_contexts": ["CI / all-required"],
-  "required_approvals": 1,
-  "block_on_rejected_reviews": true
-}
-```
-
-After any `POST /branch_protections`, immediately GET and verify the values
-persisted — the API returns 201 even when fields are silently dropped.
-
-If the queue returns HTTP 405 ("User not allowed to merge"), the first
-diagnostic step is `GET /branch_protections/main` and checking whether
-`merge_whitelist_usernames` still contains `devops-engineer`.
-
 ## Failure Handling

 If `main` is not green, the queue pauses and does not merge anything.
@@ -196,134 +196,69 @@ primary consumer of combined status and is affected.

 ---

-## Quirk #7 — Gitea branch protection API silently ignores some field names
+## Quirk #7 — TBD
+
+*[Placeholder — document here when a new Gitea Actions quirk is discovered.]*

 ### Finding

-The Gitea 1.22.6 `POST /repos/{org}/{repo}/branch_protections` API accepts a
-non-obvious set of field names. Several intuitively-correct names are silently
-ignored — the call returns 201 but the field is dropped:
-
-| Intended field | Correct API name | Silently ignored aliases |
-|---|---|---|
-| Enable merge whitelist | `enable_merge_whitelist` | `user_can_merge`, `merge_whitelist_enabled` |
-| Users who can merge | `merge_whitelist_usernames` | `merge_whitelist_users`, `whitelisted_users` |
-| Enable status check | `enable_status_check` | `enable_status_checks`, `require_status_checks` |
-| Required status contexts | `status_check_contexts` | `required_status_checks.contexts` |
-| Block on rejected reviews | `block_on_rejected_reviews` | (this one works) |
-| Required approvals | `required_approvals` | `required_reviewers` |
-
-The GET response after a POST shows the actual stored values. A naive
-GET → modify → POST cycle (without using the exact GET field names) will
-silently reset the merge whitelist on every call.
+*[What Gitea Actions does differently from GitHub Actions.]*

 ### Impact

- Branch protection merge whitelist resets to empty after any API mis-invocation
- Queue AUTO_SYNC_TOKEN (`devops-engineer`) loses Can-merge permission → HTTP 405
- All queued PRs blocked until whitelist is restored
- Confirmed reset on Gitea server restart/upgrade (Gitea uses default values)
+*[Which workflows or operations are affected.]*

 ### Workaround

-1. Always GET the current protection first and use **exact** field names from the
-   GET response when modifying
-2. After any `POST /branch_protections`, immediately GET and verify
-   `enable_merge_whitelist: true` and `merge_whitelist_usernames` contains
-   `["devops-engineer", "hongming", "core-devops"]`
-3. The queue bot should verify branch protection before each merge tick
-4. For queue to work: `enable_merge_whitelist: true` +
-   `merge_whitelist_usernames: ["devops-engineer", "hongming", "core-devops"]` +
-   `enable_status_check: true` + `status_check_contexts: ["CI / all-required"]`
+*[How to work around this quirk.]*

 ### References

- SEV-1 2026-05-17: 3x branch protection resets caused 405 on all queue merges
- `feedback_gitea_branch_protection_api_field_names`
+- internal#[N]: first observation

 ---

-## Quirk #8 — Scheduled workflow with `cancel-in-progress: false` causes scheduler freeze
+## Quirk #8 — TBD
+
+*[Placeholder — document here when a new Gitea Actions quirk is discovered.]*

 ### Finding

-When a `schedule:` workflow has `concurrency.cancel-in-progress: false`, and a
-new cron tick fires while the previous run is still executing, the Gitea Actions
-scheduler stops dispatching the workflow entirely. Pending entries accumulate
-indefinitely — the scheduler shows the workflow as "scheduled" but never dispatches.
-
-This is dangerous for workflows with variable execution time (e.g., workflows that
-wait for downstream CI, or workflows that run on slow/degraded runners).
+*[What Gitea Actions does differently from GitHub Actions.]*

 ### Impact

- `gitea-merge-queue.yml` with `cancel-in-progress: false` froze on 2026-05-17
-  starting ~16:44Z — pending runs accumulated, no new runs dispatched
- Queue appeared stalled; all 22 queued PRs blocked
- The `gitea-merge-queue` workflow itself becomes invisible to operators
+*[Which workflows or operations are affected.]*

 ### Workaround

-**Always set `cancel-in-progress: true` on `schedule:` workflows:**
-
-```yaml
-concurrency:
-  group: workflow-name
-  cancel-in-progress: true   # ← always true for schedule: workflows
-```
-
-If the freeze has already occurred: the scheduler recovers automatically after the
-currently-running instance completes (Gitea dispatches the next queued tick).
+*[How to work around this quirk.]*

 ### References

- SEV-1 2026-05-17: queue frozen since 16:44Z; fixed by setting `cancel-in-progress: true`
- PR #1358: `fix(scheduled-workflows): enable cancel-in-progress` (pending merge)
+- internal#[N]: first observation

 ---

-## Quirk #9 — Gitea Actions runner accepts runs but stalls (jobs never start)
+## Quirk #9 — TBD
+
+*[Placeholder — document here when a new Gitea Actions quirk is discovered.]*

 ### Finding

-The Gitea Actions runner on host `5.78.80.188` can enter a degraded state where:
-1. It accepts new workflow runs (shows "in_progress" in the UI)
-2. It never starts any jobs — pending count grows indefinitely
-3. The runner shows as "online" and accepting runs
-4. After ~60–90 minutes, the runner self-recovers and all pending jobs start
-
-This is distinct from a true runner crash (which would show as offline).
+*[What Gitea Actions does differently from GitHub Actions.]*

 ### Impact

- All CI jobs for all PRs stall — no status updates posted
- Queue waits indefinitely for CI (which never posts success)
- `sop-checklist` and other workflows time out on affected PRs
- Looks like the runner is working (green in UI) but nothing executes
-
-### How to diagnose
-
-Add a debug step to a known-failing workflow:
-
-```bash
-# In a stalled job:
-curl -s http://localhost:8088/debug/pprof/trace?seconds=5 | head
-# Check runner process CPU — if near 0% while jobs are pending, runner is stalled
-```
-
-Check runner logs on the host (`/var/log/actrunner.log` or similar).
+*[Which workflows or operations are affected.]*

 ### Workaround

-No operator workaround while stalled — the runner self-recovers. Options:
-1. **Wait** — runner typically recovers within 90 minutes
-2. **Restart the runner service** — `systemctl restart act_runner` (requires host access)
-3. **Move to a second runner** — if registered, re-route dispatch
+*[How to work around this quirk.]*

 ### References

- SEV-1 2026-05-17: runner stalled; self-recovered ~21:33Z after ~90 min
- `feedback_gitea_runner_stall_accepted_jobs_no_execution`
+- internal#[N]: first observation

 ---