Compare commits

..

4 Commits

Author SHA1 Message Date
devops-engineer a6d67b4c68 fix(ci): pre-clone manifest deps in workflow, drop in-image clone (closes #173)
publish-workspace-server-image.yml could not run on Gitea Actions because
Dockerfile.tenant's stage 3 ran `git clone` against private Gitea repos
from inside the Docker build context, where no auth path exists. Every
workspace-server rebuild required a manual operator-host push.

Move cloning to the trusted CI context (where AUTO_SYNC_TOKEN — the
devops-engineer persona PAT — is naturally available). Dockerfile.tenant
now COPYs from .tenant-bundle-deps/, populated by the workflow's new
"Pre-clone manifest deps" step. The Gitea token never enters the image.

- scripts/clone-manifest.sh: optional MOLECULE_GITEA_TOKEN env embeds
  basic-auth in the clone URL; redacted in log output. Anonymous fallback
  preserved for future public-repo path.
- .github/workflows/publish-workspace-server-image.yml: new pre-clone
  step before docker build; injects AUTO_SYNC_TOKEN. Fail-fast if the
  secret is empty.
- workspace-server/Dockerfile.tenant: drop stage 3 (templates), COPY
  from .tenant-bundle-deps/ instead. Header documents the prereq.
- .gitignore: ignore /.tenant-bundle-deps/ so a local build can't
  accidentally commit cloned repos.

Verified locally: clone-manifest.sh with the devops-engineer persona
token cloned all 37 repos (9 ws + 7 org + 21 plugins, 4.9MB after
.git strip).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 12:59:46 -07:00
claude-ceo-assistant d2da0c8d34 Merge pull request 'fix(workspace-server): a2a-proxy preflight container check (closes #36)' (#37) from fix/issue36-a2a-proxy-preflight into main 2026-05-07 18:25:07 +00:00
claude-ceo-assistant be5fbb5ad3 fix(workspace-server): a2a-proxy preflight container check (closes #36)
Same SSOT-divergence shape as #10 / fixed in #12, but on the a2a-proxy
code path. The plugin handler was routed through `provisioner.RunningContainerName`;
a2a-proxy was forwarding optimistically and only catching missing containers
REACTIVELY via `maybeMarkContainerDead` after the network call timed out.

Result on tenants whose agent containers had been recycled (e.g. post-EC2
replace from molecule-controlplane#20): canvas waits 2-30s for the network
forward to fail before getting a 503, and the workspace-server logs only
"ProxyA2A forward error" without the "container is dead" signal.

This PR adds a proactive `Provisioner.IsRunning` check in `proxyA2ARequest`
between `resolveAgentURL` and `dispatchA2A`, gated on the conditions where
we know we're talking to a sibling Docker container we own (`h.provisioner
!= nil` AND `platformInDocker` AND the URL was rewritten to Docker-DNS form).

Three outcomes via the SSOT helper:
  (true,  nil) → forward as today
  (false, nil) → fast-503 with `error="workspace container not running —
                 restart triggered"`, `restarting=true`, `preflight=true`,
                 plus the same offline-flip + WORKSPACE_OFFLINE broadcast +
                 async restart that `maybeMarkContainerDead` produces
  (true,  err) → fall through to optimistic forward (matches IsRunning's
                 "fail-soft as alive" contract — flaky daemon must not
                 trigger a restart cascade)

The `preflight=true` flag in the response distinguishes the proactive
short-circuit from the reactive `maybeMarkContainerDead` path so canvas
or downstream callers can render distinct messages later.

* `internal/handlers/a2a_proxy.go` — preflight call site between
  resolveAgentURL and dispatchA2A; gated on `h.provisioner != nil &&
  platformInDocker && url == http://<ContainerName(id)>:port`.
* `internal/handlers/a2a_proxy_helpers.go` — `preflightContainerHealth`
  helper. Routes through `h.provisioner.IsRunning` (which itself wraps
  `RunningContainerName`). Identical offline-flip side-effects as
  `maybeMarkContainerDead` for the dead-container case.
* `internal/handlers/a2a_proxy_preflight_test.go` — 4 tests: running →
  nil; not-running → structured 503 + sqlmock expectations on the
  offline-flip + structure_events insert; transient error → nil
  (fail-soft); AST gate pinning the SSOT routing (mirror of #12's gate).

Mutation-tested: removing the `if running { return nil }` guard makes
the production code fail to compile (unused var). A subtler mutation
(replacing the !running branch with `return nil`) would make
TestPreflight_ContainerNotRunning_StructuredFastFail fail at runtime
with sqlmock's "expected DB call did not occur."

Refs: molecule-core#36. Companion to #12 (issue #10).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 11:15:08 -07:00
claude-ceo-assistant b9ca4ad84a Merge pull request 'fix(ci): mark CodeQL continue-on-error (advisory only) — closes #156' (#35) from fix/codeql-continue-on-error-156 into main 2026-05-07 17:26:59 +00:00
7 changed files with 403 additions and 20 deletions
@@ -102,6 +102,55 @@ jobs:
run: |
echo "sha=${GITHUB_SHA::7}" >> "$GITHUB_OUTPUT"
# Pre-clone manifest deps before docker build (Task #173 fix).
#
# Why pre-clone: post-2026-05-06, every workspace-template-* repo on
# Gitea (codex, crewai, deepagents, gemini-cli, langgraph) plus all
# 7 org-template-* repos are private. The pre-fix Dockerfile.tenant
# ran `git clone` inside an in-image stage, which had no auth path
# — every CI build failed with "fatal: could not read Username for
# https://git.moleculesai.app". For weeks, every workspace-server
# rebuild required a manual operator-host push. Now we clone in the
# trusted CI context (where AUTO_SYNC_TOKEN is naturally available)
# and Dockerfile.tenant just COPYs from .tenant-bundle-deps/.
#
# Token shape: AUTO_SYNC_TOKEN is the devops-engineer persona PAT
# (see /etc/molecule-bootstrap/agent-secrets.env). Per saved memory
# `feedback_per_agent_gitea_identity_default`, every CI surface uses
# a per-persona token, never the founder PAT. clone-manifest.sh
# embeds it as basic-auth (oauth2:<token>) for the duration of the
# clones, then strips .git directories — the token never enters
# the resulting image.
#
# Idempotent: if a re-run finds populated dirs, clone-manifest.sh
# skips them; safe to retrigger via path-filter or workflow_dispatch.
- name: Pre-clone manifest deps
env:
MOLECULE_GITEA_TOKEN: ${{ secrets.AUTO_SYNC_TOKEN }}
run: |
set -euo pipefail
if [ -z "${MOLECULE_GITEA_TOKEN}" ]; then
echo "::error::AUTO_SYNC_TOKEN secret is empty — register the devops-engineer persona PAT in repo Actions secrets"
exit 1
fi
mkdir -p .tenant-bundle-deps
bash scripts/clone-manifest.sh \
manifest.json \
.tenant-bundle-deps/workspace-configs-templates \
.tenant-bundle-deps/org-templates \
.tenant-bundle-deps/plugins
# Sanity-check counts so a silent partial clone fails fast
# instead of producing a half-empty image.
ws_count=$(find .tenant-bundle-deps/workspace-configs-templates -mindepth 1 -maxdepth 1 -type d | wc -l)
org_count=$(find .tenant-bundle-deps/org-templates -mindepth 1 -maxdepth 1 -type d | wc -l)
plugins_count=$(find .tenant-bundle-deps/plugins -mindepth 1 -maxdepth 1 -type d | wc -l)
echo "Cloned: ws=$ws_count org=$org_count plugins=$plugins_count"
# Counts are derived from manifest.json (9 ws / 7 org / 21
# plugins as of 2026-05-07). If manifest.json grows but the
# clone step regresses silently, the find above caps at the
# actual disk state — but clone-manifest.sh's own EXPECTED vs
# CLONED check (line ~95) is the authoritative fail-fast.
# Canary-gated release flow:
# - This step always publishes :staging-<sha> + :staging-latest.
# - On staging push, staging-CP picks up :staging-latest immediately
+7
View File
@@ -131,6 +131,13 @@ backups/
# Cloned by publish-workspace-server-image.yml so the Dockerfile's
# replace-directive path resolves. Lives in its own repo.
/molecule-ai-plugin-github-app-auth/
# Tenant-image build context — populated by the workflow's
# "Pre-clone manifest deps" step. Mirrors the public manifest, holds the
# same content as the three /<>/ dirs above but namespaced under one
# parent so the Docker build context is a single COPY-friendly tree.
# Each entry is a transient working-dir, never source-of-truth, never
# committed.
/.tenant-bundle-deps/
# Internal-flavored content lives in Molecule-AI/internal — NEVER in this
# public monorepo. Migrated 2026-04-23 (CEO directive). The CI workflow
+39 -4
View File
@@ -6,6 +6,29 @@
# ./scripts/clone-manifest.sh <manifest.json> <ws-templates-dir> <org-templates-dir> <plugins-dir>
#
# Requires: git, jq (lighter than python3 — ~2MB vs ~50MB in Alpine)
#
# Auth (optional):
# When MOLECULE_GITEA_TOKEN is set, embed it as the basic-auth password so
# private Gitea repos clone successfully. When unset, clone anonymously
# (works only for repos that are public on git.moleculesai.app).
#
# This is the path the publish-workspace-server-image.yml workflow uses:
# it injects AUTO_SYNC_TOKEN (devops-engineer persona PAT, repo:read on
# the molecule-ai org) so the in-CI pre-clone step succeeds for ALL
# manifest entries — including the 5 private workspace-template-* repos
# (codex, crewai, deepagents, gemini-cli, langgraph) and all 7
# org-template-* repos.
#
# The token never enters the Docker image: this script runs in the
# trusted CI context BEFORE `docker buildx build`, populates
# .tenant-bundle-deps/, then `Dockerfile.tenant` COPYs from there with
# the .git directories already stripped (see line ~67 below).
#
# For backward compatibility — and so a fresh clone works without
# secrets when (eventually) the workspace-template-* repos flip public —
# the unset path remains a plain anonymous HTTPS clone. That path will
# FAIL with "could not read Username" on private repos today; CI MUST
# set MOLECULE_GITEA_TOKEN.
set -euo pipefail
@@ -52,11 +75,23 @@ clone_category() {
# every manifest entry.
repo_gitea="$(echo "$repo" | awk -F/ '{ printf "%s", tolower($1); for (i=2; i<=NF; i++) printf "/%s", $i; print "" }')"
echo " cloning $repo_gitea -> $target_dir/$name (ref=$ref)"
if [ "$ref" = "main" ]; then
git clone --depth=1 -q "https://git.moleculesai.app/${repo_gitea}.git" "$target_dir/$name"
# Build the clone URL. When MOLECULE_GITEA_TOKEN is set (CI path)
# embed it as basic-auth so private repos succeed. The username
# part ("oauth2") is conventional and ignored by Gitea — only the
# token-as-password is verified.
if [ -n "${MOLECULE_GITEA_TOKEN:-}" ]; then
clone_url="https://oauth2:${MOLECULE_GITEA_TOKEN}@git.moleculesai.app/${repo_gitea}.git"
display_url="https://oauth2:***@git.moleculesai.app/${repo_gitea}.git"
else
git clone --depth=1 -q --branch "$ref" "https://git.moleculesai.app/${repo_gitea}.git" "$target_dir/$name"
clone_url="https://git.moleculesai.app/${repo_gitea}.git"
display_url="$clone_url"
fi
echo " cloning $display_url -> $target_dir/$name (ref=$ref)"
if [ "$ref" = "main" ]; then
git clone --depth=1 -q "$clone_url" "$target_dir/$name"
else
git clone --depth=1 -q --branch "$ref" "$clone_url" "$target_dir/$name"
fi
CLONED=$((CLONED + 1))
i=$((i + 1))
+32 -16
View File
@@ -3,14 +3,34 @@
# Serves both the API (Go on :8080) and the UI (Node.js on :3000) in a
# single container. Go reverse-proxies unknown routes to canvas.
#
# Templates are cloned from standalone GitHub repos at build time so the
# monorepo doesn't need to carry them. The repos are public; no auth.
# Templates + plugins are NOT cloned at build time. They are pre-cloned
# in the trusted CI context (or operator host) by
# `scripts/clone-manifest.sh` into `.tenant-bundle-deps/` and COPYed in.
# The reason: post-2026-05-06, every workspace-template-* repo on Gitea
# (codex, crewai, deepagents, gemini-cli, langgraph) plus all 7
# org-template-* repos are private, so the Docker build can't `git clone`
# from inside the build context — there's no auth path that doesn't leak
# the Gitea token into an image layer. Pre-cloning keeps the token in
# the CI environment only; the resulting image carries the cloned trees
# with `.git` already stripped (see clone-manifest.sh).
#
# Build context: repo root.
# Build context: repo root, with `.tenant-bundle-deps/` populated by:
#
# MOLECULE_GITEA_TOKEN=<persona-PAT> scripts/clone-manifest.sh \
# manifest.json \
# .tenant-bundle-deps/workspace-configs-templates \
# .tenant-bundle-deps/org-templates \
# .tenant-bundle-deps/plugins
#
# In CI this happens in publish-workspace-server-image.yml's "Pre-clone
# manifest deps" step (uses AUTO_SYNC_TOKEN = devops-engineer persona).
# For a manual operator-host build, source the same token from
# /etc/molecule-bootstrap/agent-secrets.env first.
#
# docker buildx build --platform linux/amd64 \
# -f workspace-server/Dockerfile.tenant \
# -t registry.fly.io/molecule-tenant:latest \
# -t <ECR>/molecule-ai/platform-tenant:latest \
# --build-arg GIT_SHA=<sha> --build-arg NEXT_PUBLIC_PLATFORM_URL= \
# --push .
# ── Stage 1: Go platform binary ──────────────────────────────────────
@@ -55,14 +75,7 @@ ENV NEXT_PUBLIC_PLATFORM_URL=$NEXT_PUBLIC_PLATFORM_URL
ENV NEXT_PUBLIC_WS_URL=$NEXT_PUBLIC_WS_URL
RUN npm run build
# ── Stage 3: Clone templates + plugins from manifest.json ─────────────
FROM alpine:3.20 AS templates
RUN apk add --no-cache git jq
COPY manifest.json /manifest.json
COPY scripts/clone-manifest.sh /scripts/clone-manifest.sh
RUN chmod +x /scripts/clone-manifest.sh && /scripts/clone-manifest.sh /manifest.json /workspace-configs-templates /org-templates /plugins
# ── Stage 4: Runtime ──────────────────────────────────────────────────
# ── Stage 3: Runtime ──────────────────────────────────────────────────
FROM node:20-alpine
RUN apk add --no-cache ca-certificates git tzdata openssh-client aws-cli
@@ -87,10 +100,13 @@ COPY --from=go-builder /platform /platform
COPY --from=go-builder /memory-plugin /memory-plugin
COPY workspace-server/migrations /migrations
# Templates + plugins (cloned from GitHub in stage 3)
COPY --from=templates /workspace-configs-templates /workspace-configs-templates
COPY --from=templates /org-templates /org-templates
COPY --from=templates /plugins /plugins
# Templates + plugins (pre-cloned by scripts/clone-manifest.sh in the
# trusted CI / operator-host context, .git already stripped — see
# .tenant-bundle-deps/ in the build context). The Gitea token used to
# clone them never enters this image.
COPY .tenant-bundle-deps/workspace-configs-templates /workspace-configs-templates
COPY .tenant-bundle-deps/org-templates /org-templates
COPY .tenant-bundle-deps/plugins /plugins
# Canvas standalone
WORKDIR /canvas
@@ -435,6 +435,34 @@ func (h *WorkspaceHandler) proxyA2ARequest(ctx context.Context, workspaceID stri
return 0, nil, proxyErr
}
// Pre-flight container-health check (#36). The dispatchA2A path below
// does Docker-DNS forwarding to `ws-<wsShort>:8000` and only catches a
// missing/dead container REACTIVELY via maybeMarkContainerDead in
// handleA2ADispatchError. That works but costs the caller a full
// network-timeout (2-30s) before the structured 503 surfaces.
//
// When we KNOW the workspace is container-backed (h.docker != nil + we
// rewrite to Docker-DNS form below), do a single proactive
// RunningContainerName lookup. If the container is genuinely missing,
// short-circuit with the same structured 503 + async restart that
// maybeMarkContainerDead would produce — but immediately, without the
// network round-trip.
//
// Three outcomes of provisioner.RunningContainerName(ctx, h.docker, id):
// ("ws-<id>", nil) → forward as today.
// ("", nil) → container is genuinely not running. Fast-503.
// ("", err) → transient daemon error. Fall through to optimistic
// forward — matches Provisioner.IsRunning's
// (true, err) "fail-soft as alive" contract.
//
// Same SSOT as findRunningContainer (#10/#12). See AST gate
// TestProxyA2A_RoutesThroughProvisionerSSOT.
if h.provisioner != nil && platformInDocker && strings.HasPrefix(agentURL, "http://"+provisioner.ContainerName(workspaceID)+":") {
if proxyErr := h.preflightContainerHealth(ctx, workspaceID); proxyErr != nil {
return 0, nil, proxyErr
}
}
startTime := time.Now()
resp, cancelFwd, err := h.dispatchA2A(ctx, workspaceID, agentURL, body, callerID)
if cancelFwd != nil {
@@ -198,6 +198,60 @@ func (h *WorkspaceHandler) maybeMarkContainerDead(ctx context.Context, workspace
return true
}
// preflightContainerHealth runs a proactive Provisioner.IsRunning check
// (#36) before dispatching the a2a forward. Routed through provisioner's
// SSOT IsRunning, which itself wraps RunningContainerName — same source
// as findRunningContainer in the plugins handler (#10/#12).
//
// Returns nil when the forward should proceed:
// - container is running, OR
// - daemon errored transiently (matches IsRunning's (true, err)
// "fail-soft as alive" contract — let the optimistic forward run
// and reactive maybeMarkContainerDead catch a real failure).
//
// Returns a structured 503 + triggers the same async restart that
// maybeMarkContainerDead would produce, when:
// - container is genuinely not running (NotFound / Exited / Created…).
//
// The point of running this BEFORE the forward is to save the caller
// 2-30s of network-timeout cost when the container is missing — a common
// shape post-EC2-replace (see molecule-controlplane#20 incident
// 2026-05-07) where the reconciler hasn't respawned the agent yet.
func (h *WorkspaceHandler) preflightContainerHealth(ctx context.Context, workspaceID string) *proxyA2AError {
running, err := h.provisioner.IsRunning(ctx, workspaceID)
if err != nil {
// Transient daemon error. Provisioner.IsRunning returns (true, err)
// in this case — fall through to the optimistic forward, reactive
// maybeMarkContainerDead handles a real failure later.
log.Printf("ProxyA2A preflight: IsRunning transient error for %s: %v (proceeding with forward)", workspaceID, err)
return nil
}
if running {
// Container is running — forward as today.
return nil
}
// Container is genuinely not running. Mark offline + trigger restart
// (same effect as maybeMarkContainerDead's branch), and return the
// structured 503 immediately so the caller skips the forward.
log.Printf("ProxyA2A preflight: container for %s is not running — marking offline and triggering restart (#36)", workspaceID)
if _, dbErr := db.DB.ExecContext(ctx,
`UPDATE workspaces SET status = $1, updated_at = now() WHERE id = $2 AND status NOT IN ('removed', 'provisioning')`,
models.StatusOffline, workspaceID); dbErr != nil {
log.Printf("ProxyA2A preflight: failed to mark workspace %s offline: %v", workspaceID, dbErr)
}
db.ClearWorkspaceKeys(ctx, workspaceID)
h.broadcaster.RecordAndBroadcast(ctx, string(events.EventWorkspaceOffline), workspaceID, map[string]interface{}{})
go h.RestartByID(workspaceID)
return &proxyA2AError{
Status: http.StatusServiceUnavailable,
Response: gin.H{
"error": "workspace container not running — restart triggered",
"restarting": true,
"preflight": true, // distinguishes from reactive containerDead path
},
}
}
// logA2AFailure records a failed A2A attempt to activity_logs in a detached
// goroutine (the request context may already be done by the time it runs).
func (h *WorkspaceHandler) logA2AFailure(ctx context.Context, workspaceID, callerID string, body []byte, a2aMethod string, err error, durationMs int) {
@@ -0,0 +1,194 @@
package handlers
import (
"context"
"errors"
"go/ast"
"go/parser"
"go/token"
"testing"
"github.com/DATA-DOG/go-sqlmock"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/models"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/provisioner"
)
// preflightLocalProv is a controllable LocalProvisionerAPI stub for the
// preflight tests (#36). Other API methods panic to guard against tests
// that should be using a different stub.
type preflightLocalProv struct {
running bool
err error
calls int
calledWith []string
}
func (p *preflightLocalProv) IsRunning(_ context.Context, workspaceID string) (bool, error) {
p.calls++
p.calledWith = append(p.calledWith, workspaceID)
return p.running, p.err
}
func (p *preflightLocalProv) Start(_ context.Context, _ provisioner.WorkspaceConfig) (string, error) {
panic("preflightLocalProv: Start not implemented")
}
func (p *preflightLocalProv) Stop(_ context.Context, _ string) error {
panic("preflightLocalProv: Stop not implemented")
}
func (p *preflightLocalProv) ExecRead(_ context.Context, _, _ string) ([]byte, error) {
panic("preflightLocalProv: ExecRead not implemented")
}
func (p *preflightLocalProv) RemoveVolume(_ context.Context, _ string) error {
panic("preflightLocalProv: RemoveVolume not implemented")
}
func (p *preflightLocalProv) VolumeHasFile(_ context.Context, _, _ string) (bool, error) {
panic("preflightLocalProv: VolumeHasFile not implemented")
}
func (p *preflightLocalProv) WriteAuthTokenToVolume(_ context.Context, _, _ string) error {
panic("preflightLocalProv: WriteAuthTokenToVolume not implemented")
}
// TestPreflight_ContainerRunning_ReturnsNil — IsRunning(true,nil): forward
// proceeds. preflight returns nil → caller continues to dispatchA2A.
func TestPreflight_ContainerRunning_ReturnsNil(t *testing.T) {
_ = setupTestDB(t)
stub := &preflightLocalProv{running: true, err: nil}
h := NewWorkspaceHandler(newTestBroadcaster(), nil, "http://localhost:8080", t.TempDir())
h.provisioner = stub
if err := h.preflightContainerHealth(context.Background(), "ws-running-123"); err != nil {
t.Fatalf("preflight should return nil when container running, got %+v", err)
}
if stub.calls != 1 {
t.Errorf("IsRunning should be called exactly once, got %d", stub.calls)
}
if len(stub.calledWith) != 1 || stub.calledWith[0] != "ws-running-123" {
t.Errorf("IsRunning should be called with workspace id, got %v", stub.calledWith)
}
}
// TestPreflight_ContainerNotRunning_StructuredFastFail — IsRunning(false,nil):
// preflight returns structured 503 with restarting=true + preflight=true, AND
// triggers the offline-flip + WORKSPACE_OFFLINE broadcast + async restart.
// This is the load-bearing case — saves the caller 2-30s of network timeout.
func TestPreflight_ContainerNotRunning_StructuredFastFail(t *testing.T) {
mock := setupTestDB(t)
_ = setupTestRedis(t)
stub := &preflightLocalProv{running: false, err: nil}
h := NewWorkspaceHandler(newTestBroadcaster(), nil, "http://localhost:8080", t.TempDir())
h.provisioner = stub
// Expect the offline-flip UPDATE.
mock.ExpectExec(`UPDATE workspaces SET status =`).
WithArgs(models.StatusOffline, "ws-dead-456").
WillReturnResult(sqlmock.NewResult(0, 1))
// Broadcaster's INSERT INTO structure_events fires too — best-effort
// log entry for the WORKSPACE_OFFLINE event. Match permissively.
mock.ExpectExec(`INSERT INTO structure_events`).
WillReturnResult(sqlmock.NewResult(0, 1))
proxyErr := h.preflightContainerHealth(context.Background(), "ws-dead-456")
if proxyErr == nil {
t.Fatal("preflight should return *proxyA2AError when container not running")
}
if proxyErr.Status != 503 {
t.Errorf("expected 503, got %d", proxyErr.Status)
}
if got := proxyErr.Response["restarting"]; got != true {
t.Errorf("response should mark restarting=true, got %v", got)
}
if got := proxyErr.Response["preflight"]; got != true {
t.Errorf("response should mark preflight=true so callers can distinguish from reactive containerDead, got %v", got)
}
if got := proxyErr.Response["error"]; got != "workspace container not running — restart triggered" {
t.Errorf("error message mismatch, got %q", got)
}
// Note: broadcaster firing is exercised by the production path's
// h.broadcaster.RecordAndBroadcast call but not asserted here — the
// real *events.Broadcaster doesn't expose received events for inspection.
// The DB UPDATE expectation is sufficient to pin the offline-flip path.
}
// TestPreflight_TransientError_FailsSoftAsAlive — IsRunning(true,err): the
// (true, err) "fail-soft" contract — preflight returns nil so the optimistic
// forward runs; reactive maybeMarkContainerDead handles a real failure later.
// This pin is critical: a flaky daemon must NOT trigger a restart cascade.
func TestPreflight_TransientError_FailsSoftAsAlive(t *testing.T) {
_ = setupTestDB(t)
stub := &preflightLocalProv{running: true, err: errors.New("docker daemon EOF")}
h := NewWorkspaceHandler(newTestBroadcaster(), nil, "http://localhost:8080", t.TempDir())
h.provisioner = stub
if err := h.preflightContainerHealth(context.Background(), "ws-flaky-789"); err != nil {
t.Fatalf("preflight should return nil on transient error (fail-soft), got %+v", err)
}
// No DB UPDATE expected — sqlmock would complain about unexpected calls
// at test cleanup if the offline-flip path fired.
}
// TestProxyA2A_Preflight_RoutesThroughProvisionerSSOT — AST gate (#36 mirror
// of #12's gate). Pins the invariant that preflightContainerHealth uses the
// SSOT Provisioner.IsRunning helper, NOT a parallel docker.ContainerInspect
// of its own.
//
// Mutation invariant: if a future PR replaces h.provisioner.IsRunning with
// a direct cli.ContainerInspect call, this test fails. That's the signal to
// either (a) extend Provisioner.IsRunning's contract OR (b) document why
// this call site needs to differ. Either way, the drift gets a reviewer's
// attention instead of shipping silently.
func TestProxyA2A_Preflight_RoutesThroughProvisionerSSOT(t *testing.T) {
fset := token.NewFileSet()
file, err := parser.ParseFile(fset, "a2a_proxy_helpers.go", nil, parser.ParseComments)
if err != nil {
t.Fatalf("parse a2a_proxy_helpers.go: %v", err)
}
var fn *ast.FuncDecl
ast.Inspect(file, func(n ast.Node) bool {
f, ok := n.(*ast.FuncDecl)
if !ok || f.Name.Name != "preflightContainerHealth" {
return true
}
fn = f
return false
})
if fn == nil {
t.Fatal("preflightContainerHealth not found — was it renamed? update this gate or the SSOT routing assumption")
}
var (
callsIsRunning bool
callsContainerInspectRaw bool
callsRunningContainerNameDirect bool
)
ast.Inspect(fn.Body, func(n ast.Node) bool {
call, ok := n.(*ast.CallExpr)
if !ok {
return true
}
sel, ok := call.Fun.(*ast.SelectorExpr)
if !ok {
return true
}
switch sel.Sel.Name {
case "IsRunning":
callsIsRunning = true
case "ContainerInspect":
callsContainerInspectRaw = true
case "RunningContainerName":
// Direct RunningContainerName is also acceptable SSOT — but
// preferring IsRunning keeps the (bool, error) contract that
// already exists in the helper API surface.
callsRunningContainerNameDirect = true
}
return true
})
if !callsIsRunning && !callsRunningContainerNameDirect {
t.Errorf("preflightContainerHealth must call provisioner.IsRunning OR provisioner.RunningContainerName for the SSOT health check — see molecule-core#36. Found neither.")
}
if callsContainerInspectRaw {
t.Errorf("preflightContainerHealth carries a direct ContainerInspect call. This is the parallel-impl drift molecule-core#36 fixed. " +
"Either route through provisioner.IsRunning OR — if a new use case truly needs a different inspect — extend the helper's contract first and update this gate to allow the specific delta.")
}
}