Compare commits

...

3 Commits

Author SHA1 Message Date
core-uiux ce782df036 fix(ci): add pull-requests:write to gate-check-v3 permissions (mc#729)
CI CI/all-required green
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 16s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 15s
CI / Detect changes (pull_request) Successful in 57s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 56s
E2E API Smoke Test / detect-changes (pull_request) Successful in 1m0s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 1m0s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 16s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 44s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: 7
qa-review / approved (pull_request) Failing after 18s
security-review / approved (pull_request) Failing after 18s
sop-checklist-gate / gate (pull_request) Successful in 17s
sop-tier-check / tier-check (pull_request) Successful in 15s
gate-check-v3 / gate-check (pull_request) Successful in 26s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Failing after 1m28s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m26s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m38s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 2m8s
CI / Platform (Go) (pull_request) Successful in 8s
CI / Canvas (Next.js) (pull_request) Successful in 8s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 7s
CI / Python Lint & Test (pull_request) Successful in 8s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 9s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 7s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 9s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 7s
audit-force-merge / audit (pull_request) Has been skipped
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / all-required (pull_request) Successful in 6s
2026-05-12 14:30:45 +00:00
core-uiux e2a52696c3 Merge pull request 'fix(ci): add Docker daemon diagnostics to publish-workspace-server-image (mc#711)' (#722) from infra/publish-docker-daemon-diagnostic into main
redeploy-tenants-on-main / redeploy (push) Has been skipped
Block internal-flavored paths / Block forbidden paths (push) Successful in 8s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (push) Successful in 10s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 9s
CI / Detect changes (push) Successful in 22s
E2E API Smoke Test / detect-changes (push) Successful in 22s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 22s
Handlers Postgres Integration / detect-changes (push) Successful in 24s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 21s
CI / Platform (Go) (push) Successful in 5s
CI / Shellcheck (E2E scripts) (push) Successful in 3s
CI / Canvas (Next.js) (push) Successful in 4s
CI / Python Lint & Test (push) Successful in 4s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 4s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 3s
CI / Canvas Deploy Reminder (push) Has been skipped
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Successful in 3s
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 5s
CI / all-required (push) Successful in 1s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (push) Failing after 1m8s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (push) Successful in 1m21s
Sweep stale e2e-* orgs (staging) / Sweep e2e orgs (push) Successful in 6s
Sweep stale AWS Secrets Manager secrets / Sweep AWS Secrets Manager (push) Compensated by status-reaper (workflow has no push: trigger; Gitea 1.22.6 hardcoded-suffix bug — see .gitea/scripts/status-reaper.py)
status-reaper / reap (push) Successful in 2m53s
publish-workspace-server-image / build-and-push (push) Successful in 6m26s
Staging SaaS smoke (every 30 min) / Staging SaaS smoke (push) Compensated by status-reaper (workflow has no push: trigger; Gitea 1.22.6 hardcoded-suffix bug — see .gitea/scripts/status-reaper.py)
2026-05-12 14:28:17 +00:00
core-devops 6625c3be12 fix(ci): replace Docker health check with full daemon diagnostic (mc#711)
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 3s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 6s
CI / Detect changes (pull_request) Successful in 12s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 15s
E2E API Smoke Test / detect-changes (pull_request) Successful in 16s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 16s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 12s
qa-review / approved (pull_request) Failing after 12s
gate-check-v3 / gate-check (pull_request) Successful in 18s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 19s
security-review / approved (pull_request) Failing after 10s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 6s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: 7
CI / Platform (Go) (pull_request) Successful in 8s
CI / Canvas (Next.js) (pull_request) Successful in 8s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 7s
sop-checklist-gate / gate (pull_request) Successful in 11s
CI / Python Lint & Test (pull_request) Successful in 7s
sop-tier-check / tier-check (pull_request) Successful in 12s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 5s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 6s
CI / all-required (pull_request) Successful in 1s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 3s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m1s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Failing after 1m5s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m13s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 1m15s
audit-force-merge / audit (pull_request) Successful in 6s
Replaces the binary pass/fail health check with a step that shows:
  - socket existence + permissions (ls -la, stat)
  - current user + groups (id)
  - docker version (client AND server)
  - docker info (full output)

mc#711 root cause confirmed: molecule-canonical-1 docker info shows
"Client: Docker Engine 28.0.4" but no Server section — the daemon
is not running. DinD socket mount is present in the act_runner
container config but the daemon itself doesn't respond.

This diagnostic step lets ops triage which runners have a live
daemon vs a dead one, and provides actionable socket/user info
for the daemon-restart fix.

The old REVERTED comment about docker-runner-labels is removed as
stale (ops will handle daemon restart as the real fix).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-12 11:57:25 +00:00
2 changed files with 25 additions and 25 deletions
+8
View File
@@ -32,6 +32,14 @@ on:
# iterating all open PRs when PR_NUMBER is empty.
workflow_dispatch:
permissions:
# read: contents — for checkout (base ref, not PR head for security)
# read: pull-requests — for reading PR info via API
# write: pull-requests — for posting/updating gate-check comments
# Without this the token cannot POST/PATCH /issues/comments → 403.
contents: read
pull-requests: write
env:
GITHUB_SERVER_URL: https://git.moleculesai.app
@@ -20,6 +20,12 @@ name: publish-workspace-server-image
#
# ECR target: 153263036946.dkr.ecr.us-east-2.amazonaws.com/molecule-ai/*
# Required secrets: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AUTO_SYNC_TOKEN
#
# mc#711: Docker daemon not accessible on ubuntu-latest runner (molecule-canonical-1
# shows client-only in `docker info` — daemon not running). DinD mount is present but
# daemon doesn't respond. Fix: add diagnostic step showing socket info so ops can
# identify which runners have a live daemon. If no daemon is available, the job
# fails fast with actionable output rather than silent deep failure.
on:
push:
@@ -52,36 +58,25 @@ env:
jobs:
build-and-push:
# REVERTED (infra/revert-docker-runner-label): `runs-on: ubuntu-latest` restored.
# The `docker` label is not registered on any act_runner. `runs-on: [ubuntu-latest, docker]`
# causes jobs to queue indefinitely with zero eligible runners — strictly worse than the
# pre-#599 coin-flip (50% success rate). Once the `docker` label is registered on
# ≥2 runners, re-apply the fix from #599 (infra/docker-runner-label).
# See issue #576 + infra-lead pulse ~00:30Z.
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
# Health check: verify Docker daemon is accessible before attempting any
# build steps. This fails loudly at step 1 when the runner's docker.sock
# is inaccessible (e.g. permission change, daemon restart, or group-membership
# drift) rather than silently continuing to step 2 where `docker build`
# fails deep in the process with a cryptic ECR auth error that doesn't
# surface the root cause. Also reports the daemon version so operator
# can correlate with runner host logs.
- name: Verify Docker daemon access
- name: Diagnose Docker daemon access
run: |
set -euo pipefail
echo "::group::Docker daemon health check"
echo "::group::Docker daemon diagnosis"
echo "Runner: ${HOSTNAME:-unknown}"
docker info 2>&1 | head -5 || {
echo "::error::Docker daemon is not accessible at /var/run/docker.sock"
echo "::error::Runner: ${HOSTNAME:-unknown}"
echo "::error::Check: (1) daemon is running, (2) runner user is in docker group, (3) sock permissions are 660+"
exit 1
}
echo "Docker daemon OK"
echo "--- Socket info ---"
ls -la /var/run/docker.sock 2>/dev/null || echo "/var/run/docker.sock: not found"
stat /var/run/docker.sock 2>/dev/null || true
echo "--- User info ---"
id
echo "--- docker version ---"
docker version 2>&1 || true
echo "--- docker info (full) ---"
docker info 2>&1 || echo "docker info failed: exit $?"
echo "::endgroup::"
# Pre-clone manifest deps before docker build.
@@ -100,9 +95,6 @@ jobs:
MOLECULE_GITEA_TOKEN: ${{ secrets.AUTO_SYNC_TOKEN }}
run: |
set -euo pipefail
# clone-manifest.sh supports anonymous cloning for public repos (post-
# 2026-05-08 migration). The token is only needed for private repos.
# Do NOT require it — a missing secret would fail the build unnecessarily.
mkdir -p .tenant-bundle-deps
# Strip JSON5 comments before jq parsing — Integration Tester appends
# `// Triggered by ...` which breaks `jq` in clone-manifest.sh.