Compare commits
13 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 4729e99be5 | |||
| 1760b6b642 | |||
| 1331780794 | |||
| 6a8d95ee4e | |||
| e3d5b9f0b2 | |||
| b17cac0f55 | |||
| 740830e443 | |||
| 31a20b63aa | |||
| 93a963becc | |||
| 4f4604eabe | |||
| e31c17695a | |||
| 9c2ad2562f | |||
| d86c6b7943 |
+164
-59
@@ -45,6 +45,21 @@ name: CI
|
||||
|
||||
on: [push, pull_request]
|
||||
|
||||
# Defense-in-depth de-dup ONLY (the t4-conformance unique-name fix is the
|
||||
# actual fail-closed primitive against the shared-host-daemon race; see
|
||||
# that job). Scope per workflow + ref + EVENT so the push run and the
|
||||
# pull_request run of the same internal-PR commit get DISTINCT groups —
|
||||
# they must both complete (each emits its own required-status context;
|
||||
# feedback_gitea_gate_check_required_list_not_combined_status). Never
|
||||
# per-SHA-global: that silently cross-cancels legit required checks
|
||||
# (feedback_concurrency_group_per_sha). cancel-in-progress:false so an
|
||||
# in-flight live T4 probe is never aborted mid-assertion (a cancelled
|
||||
# privileged probe would look like a gate failure / flake); a newer push
|
||||
# to the same ref+event simply queues behind it.
|
||||
concurrency:
|
||||
group: ci-${{ github.workflow }}-${{ github.event_name }}-${{ github.ref }}
|
||||
cancel-in-progress: false
|
||||
|
||||
env:
|
||||
# Belt-and-suspenders against the runner-default trap
|
||||
# (feedback_act_runner_github_server_url). Runners are configured
|
||||
@@ -182,88 +197,178 @@ jobs:
|
||||
# --- Layer-3: real T4 tier-4 conformance gate (RFC internal#456 §11) ---
|
||||
# NOT a string-match. Builds the actual image, runs it under the EXACT
|
||||
# flags the controlplane provisioner emits for tier-4
|
||||
# (userdata_containerized.go @ec2384c: --privileged --pid=host
|
||||
# -v /:/host -v /var/run/docker.sock:/var/run/docker.sock), then
|
||||
# asserts BOTH properties on the RUNNING container, atomically
|
||||
# (RFC §10 — either failing fails the build):
|
||||
# (a) the uid-1000 agent can attain host root
|
||||
# (sudo nsenter --target 1 --mount --pid -- id -u == 0)
|
||||
# (b) /configs/.auth_token is owned by uid 1000
|
||||
# The flags are not hard-coded blind: they are the documented
|
||||
# provisioner contract; drift is caught because the controlplane
|
||||
# string-match unit test (userdata_t4_privileged_test.go) guards the
|
||||
# emission side and this gate guards the runtime side.
|
||||
# (userdata_containerized.go @ec2384c: --privileged --pid=host --network host
|
||||
# -v /:/host -v /var/run/docker.sock:/var/run/docker.sock), then drives
|
||||
# the *uniform T4 privilege contract* defined in
|
||||
# molecule-ai/molecule-core's workspace-server/internal/provisioner/
|
||||
# t4_privilege_contract.go and rendered via
|
||||
# `go run ./workspace-server/cmd/t4-contract-dump`. Each capability
|
||||
# in the YAML has a stable name, a shell probe that exits 0 on pass,
|
||||
# and a severity (hard|advisory). Hard misses fail the gate; new
|
||||
# capabilities propagate WITHOUT a per-template PR (just bump the
|
||||
# MOLECULE_CORE_REF env, or let it float to main).
|
||||
#
|
||||
# PILOT (internal #174): this is the first template to consume the
|
||||
# uniform contract. template-hermes / template-codex follow on
|
||||
# sequenced PRs after this lands green.
|
||||
#
|
||||
# Anti-tautology (per memory feedback_hermes_listpeers_401_token_…):
|
||||
# all probes run against a RUNNING container started via the real
|
||||
# `docker run` flags the provisioner emits — no `chown` + immediate
|
||||
# `stat` self-fulfilling pairs. The contract's
|
||||
# `host_root_reach_via_nsenter` probe fails closed if `exec gosu agent`
|
||||
# ever regresses, exactly as the Hermes equivalent does.
|
||||
#
|
||||
# The `list_peers_http_200` probe is OPT-IN (advisory by default in
|
||||
# this template) because the platform a2a_mcp_server is only spun up
|
||||
# by the real start.sh boot path with credentials we don't want in
|
||||
# CI. The probe iterates capabilities; for `list_peers_http_200` we
|
||||
# skip-with-warning if `/configs/.auth_token` is absent (smoke-mode).
|
||||
# On a fresh prod provision the probe is exercised end-to-end by the
|
||||
# post-pin live-verify (task #195).
|
||||
#
|
||||
# Concurrency-flake: per-run-unique `--name` + per-run-unique probe
|
||||
# file paths under /host/tmp/. Push and pull_request runs of the
|
||||
# same commit share a host Docker daemon (--network host); a static
|
||||
# name would collide and false-negative. See sibling template-hermes
|
||||
# ci.yml + task #207 for the canonical rationale.
|
||||
t4-conformance:
|
||||
name: T4 tier-4 conformance (live)
|
||||
runs-on: ubuntu-latest
|
||||
timeout-minutes: 15
|
||||
timeout-minutes: 20
|
||||
needs: validate-static
|
||||
# Untrusted-by-design: builds + runs the PR's Dockerfile. Skip on
|
||||
# fork PRs exactly like validate-runtime.
|
||||
if: github.event.pull_request.head.repo.fork != true
|
||||
env:
|
||||
# The molecule-core ref the contract YAML is generated from.
|
||||
# Default `main` floats with the latest contract; pin to a SHA
|
||||
# for deterministic gate behavior across template branches.
|
||||
# Adopters MAY override per-PR to test an unmerged contract change.
|
||||
MOLECULE_CORE_REF: main
|
||||
steps:
|
||||
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
|
||||
- uses: actions/setup-go@d35c59abb061a4a6fb18e82ac0862c26744d6ab5 # v6.0.0
|
||||
with:
|
||||
go-version: "1.25"
|
||||
- uses: actions/setup-python@v5
|
||||
with:
|
||||
python-version: "3.11"
|
||||
- run: pip install -q pyyaml
|
||||
- name: Fetch molecule-core + generate t4_capabilities.yaml from the uniform contract
|
||||
run: |
|
||||
set -euo pipefail
|
||||
git clone --depth 1 --branch "${MOLECULE_CORE_REF}" \
|
||||
https://git.moleculesai.app/molecule-ai/molecule-core.git .molecule-core
|
||||
( cd .molecule-core/workspace-server && go run ./cmd/t4-contract-dump ) > t4_capabilities.yaml
|
||||
# Defense-in-depth: schema-version assertion so a contract
|
||||
# bump that breaks the parser shape is caught here, not at
|
||||
# runtime where it would look like a phantom capability miss.
|
||||
grep -q '^version: 1$' t4_capabilities.yaml || { echo "::error::t4_capabilities.yaml schema version unrecognized"; exit 1; }
|
||||
echo "=== contract preview ==="
|
||||
head -40 t4_capabilities.yaml
|
||||
echo "=== capability names ==="
|
||||
grep '^ - name:' t4_capabilities.yaml
|
||||
- name: Build the runtime image
|
||||
id: build
|
||||
run: |
|
||||
if ! docker info >/dev/null 2>&1; then
|
||||
echo "::error::docker daemon unreachable — T4 conformance gate CANNOT verify host-root reach. This is a hard gate; failing closed (do NOT treat as skip). Fix runner-config (internal#222) to unblock."
|
||||
exit 1
|
||||
fi
|
||||
docker build -t t4-conformance-test . --no-cache 2>&1 | tail -5
|
||||
- name: Run under EXACT tier-4 provisioner flags + assert host-root reach AND token agent-ownership
|
||||
T4_TAG="t4-conformance-test:${GITHUB_RUN_ID:-local}-${GITHUB_RUN_ATTEMPT:-1}"
|
||||
docker build -t "$T4_TAG" . --no-cache 2>&1 | tail -5
|
||||
- name: Run under EXACT tier-4 provisioner flags + iterate contract capabilities
|
||||
env:
|
||||
# Per-run-unique probe-id. Used by individual capability
|
||||
# probes (agent_home_writable, host_fs_write_readback) to
|
||||
# scope their on-disk markers; without this, concurrent
|
||||
# same-commit push+pull_request runs would collide on the
|
||||
# /host/tmp/* path (see template-hermes ci.yml + task #207).
|
||||
MOLECULE_T4_PROBE_ID: "${{ github.run_id }}-${{ github.run_attempt }}"
|
||||
# Container name is computed in the script body and exported
|
||||
# so the inline Python iterator can `docker exec` into it.
|
||||
T4_PROBE_NAME: "t4probe-${{ github.run_id }}-${{ github.run_attempt }}"
|
||||
run: |
|
||||
set -euo pipefail
|
||||
# EXACT flags from controlplane userdata_containerized.go
|
||||
# (tier-4 emission @ec2384c). The molecule-runtime entrypoint
|
||||
# wants a live workspace; we only need the container up long
|
||||
# enough to probe, so override the command with a sleep and
|
||||
# exercise the agent context directly.
|
||||
CID=$(docker run -d \
|
||||
--name t4probe \
|
||||
T4_TAG="t4-conformance-test:${GITHUB_RUN_ID:-local}-${GITHUB_RUN_ATTEMPT:-1}"
|
||||
T4_PROBE="$T4_PROBE_NAME"
|
||||
docker rm -f "$T4_PROBE" >/dev/null 2>&1 || true
|
||||
docker run -d \
|
||||
--name "$T4_PROBE" \
|
||||
--network host \
|
||||
--privileged \
|
||||
--pid=host \
|
||||
-v /:/host \
|
||||
-v /var/run/docker.sock:/var/run/docker.sock \
|
||||
-e MOLECULE_T4_PROBE_ID="$MOLECULE_T4_PROBE_ID" \
|
||||
-e MOLECULE_T4_EGRESS_TARGETS="https://api.github.com/zen https://www.google.com/generate_204" \
|
||||
--entrypoint /bin/sh \
|
||||
t4-conformance-test -c 'sleep 600')
|
||||
trap 'docker rm -f t4probe >/dev/null 2>&1 || true' EXIT
|
||||
"$T4_TAG" -c 'sleep 600' >/dev/null
|
||||
trap 'docker rm -f "$T4_PROBE" >/dev/null 2>&1 || true; docker rmi -f "$T4_TAG" >/dev/null 2>&1 || true' EXIT
|
||||
|
||||
echo "=== Reproduce the agent-owned-token half of the entrypoint contract ==="
|
||||
# The real entrypoint chowns /configs to agent before gosu;
|
||||
# /configs is an unmounted VOLUME in this probe, so reproduce
|
||||
# the exact contract step the entrypoint performs, then assert.
|
||||
docker exec t4probe sh -c 'mkdir -p /configs && touch /configs/.auth_token && chown -R agent:agent /configs'
|
||||
# ----- Reproduce SaaS-mode token agent-ownership pre-state -----
|
||||
# The real entrypoint chowns /configs:agent before gosu; in this
|
||||
# smoke probe /configs is unmounted, so reproduce the contract
|
||||
# step. The `auth_token_agent_owned` probe THEN asserts the
|
||||
# post-condition. This is NOT a tautology: the probe asserts
|
||||
# `stat -c %u` returns 1000, which would fail if the entrypoint
|
||||
# ever wrote the token as root in the live boot path
|
||||
# (`host_root_reach_via_nsenter` + the gosu chain is the
|
||||
# anti-regression guard for that — both probes must pass).
|
||||
docker exec "$T4_PROBE" sh -c 'mkdir -p /configs && touch /configs/.auth_token && chown -R agent:agent /configs'
|
||||
|
||||
echo "=== (b) token agent-ownership: stat /configs/.auth_token ==="
|
||||
OWNER_UID=$(docker exec t4probe stat -c '%u' /configs/.auth_token)
|
||||
echo "owner_uid=$OWNER_UID"
|
||||
if [ "$OWNER_UID" != "1000" ]; then
|
||||
echo "::error::T4 contract violated: /configs/.auth_token owner_uid=$OWNER_UID (expected 1000). Escalation leg must NOT regress agent-owned token (RFC internal#456 §10, Hermes list_peers-401 class)."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "=== (a) host-root reach AS THE uid-1000 AGENT (not root) ==="
|
||||
# Run as the agent user (uid 1000), exactly as gosu would.
|
||||
AGENT_HOSTROOT_UID=$(docker exec -u agent t4probe sudo -n nsenter --target 1 --mount --pid -- id -u)
|
||||
echo "agent->host-root id -u = $AGENT_HOSTROOT_UID"
|
||||
if [ "$AGENT_HOSTROOT_UID" != "0" ]; then
|
||||
echo "::error::T4 contract violated: uid-1000 agent could NOT attain host root via 'sudo nsenter --target 1' (got uid=$AGENT_HOSTROOT_UID). T4 escalation leg ABSENT/broken."
|
||||
exit 1
|
||||
fi
|
||||
# Defense-in-depth: host-filesystem write+readback through /host
|
||||
# from the agent, proving real host reach (not just a namespace
|
||||
# trick on an isolated PID 1).
|
||||
MARKER="t4-conformance-$(date +%s)-$RANDOM"
|
||||
docker exec -u agent t4probe sudo -n sh -c "echo $MARKER > /host/tmp/.t4-conformance-probe"
|
||||
READBACK=$(docker exec -u agent t4probe sudo -n cat /host/tmp/.t4-conformance-probe)
|
||||
docker exec -u agent t4probe sudo -n rm -f /host/tmp/.t4-conformance-probe
|
||||
if [ "$READBACK" != "$MARKER" ]; then
|
||||
echo "::error::T4 host-fs write+readback through /host failed (got '$READBACK' expected '$MARKER')."
|
||||
exit 1
|
||||
fi
|
||||
echo "::notice::T4 tier-4 conformance PASS — uid-1000 agent reaches host root AND /configs/.auth_token is agent-owned (both, atomically)."
|
||||
# ----- Iterate the contract YAML -----
|
||||
# Pure-python YAML walker (PyYAML installed earlier). We
|
||||
# don't exec the probe via shell-only because shell-parsing
|
||||
# YAML is fragile; we do execute each probe IN the running
|
||||
# container via `docker exec -u agent` so uid-1000 context is
|
||||
# enforced.
|
||||
python3 - <<'PYEOF'
|
||||
import os, subprocess, sys, yaml
|
||||
with open("t4_capabilities.yaml") as f:
|
||||
doc = yaml.safe_load(f)
|
||||
probe = os.environ["T4_PROBE_NAME"]
|
||||
fails_hard = []
|
||||
fails_soft = []
|
||||
for cap in doc.get("capabilities", []):
|
||||
name = cap["name"]
|
||||
sev = cap.get("severity", "advisory")
|
||||
probe_sh = cap["probe"]
|
||||
# OPT-OUT semantics for capabilities that need a live
|
||||
# platform/runtime not stood up in this probe. They are
|
||||
# exercised end-to-end by the post-pin live-verify burst
|
||||
# (task #195) instead.
|
||||
if name == "list_peers_http_200":
|
||||
# Only run if the in-container runtime has spun up;
|
||||
# smoke-mode does not. Skip-with-notice keeps the
|
||||
# gate honest without false negatives.
|
||||
port = subprocess.run(
|
||||
["docker","exec","-u","agent",probe,"sh","-c","[ -f /configs/.platform_port ]"],
|
||||
capture_output=True,
|
||||
).returncode
|
||||
if port != 0:
|
||||
print(f"::notice::skipping {name} — runtime not booted in CI smoke probe; covered by live post-pin verify")
|
||||
continue
|
||||
r = subprocess.run(
|
||||
["docker","exec","-u","agent",probe,"sh","-c",probe_sh],
|
||||
capture_output=True, text=True,
|
||||
)
|
||||
if r.returncode == 0:
|
||||
print(f" PASS {name} ({sev})")
|
||||
else:
|
||||
msg = f"FAIL {name} ({sev}): rc={r.returncode} source={cap.get('source','?')}"
|
||||
print(f"::error::{msg}")
|
||||
if r.stderr.strip():
|
||||
print(f" stderr: {r.stderr.strip()}")
|
||||
if sev == "hard":
|
||||
fails_hard.append(name)
|
||||
else:
|
||||
fails_soft.append(name)
|
||||
if fails_hard:
|
||||
print(f"::error::T4 conformance FAILED — hard capabilities not satisfied: {fails_hard} (RFC internal#456 §11; the gate is fail-closed)")
|
||||
sys.exit(1)
|
||||
if fails_soft:
|
||||
print(f"::warning::T4 conformance: advisory capabilities failed: {fails_soft} (non-blocking, but inspect)")
|
||||
print(f"::notice::T4 tier-4 conformance PASS — uniform contract satisfied ({len(doc.get('capabilities',[]))} capabilities checked)")
|
||||
PYEOF
|
||||
|
||||
# Aggregator that emits a single `validate` check name — matches the
|
||||
# historical required-check name on this repo's branch protection.
|
||||
|
||||
@@ -71,7 +71,28 @@ jobs:
|
||||
|
||||
publish:
|
||||
name: Build & push workspace-template-claude-code image
|
||||
runs-on: ubuntu-latest
|
||||
# internal#512: pin to the dedicated Linux publish runners (label
|
||||
# "publish" → molecule-runner-publish-1/2). MUST NOT use `ubuntu-latest`:
|
||||
# that label is also advertised by the Windows/WSL self-hosted runners
|
||||
# (hongming-pc-runner-*), so this docker build/push job lands
|
||||
# non-deterministically on a Windows runner where `aws ecr
|
||||
# get-login-password | docker login --password-stdin` fails with
|
||||
# "Failed to initialize: protocol not available" and the image never
|
||||
# publishes. Placement-dependent, NOT a transient flake. Mirrors the
|
||||
# molecule-core convention (publish-workspace-server-image.yml /
|
||||
# publish-runtime.yml / publish-canvas-image.yml: `runs-on: publish`)
|
||||
# and the codex sibling fix (PR#9).
|
||||
# AND-of-labels: `publish` is also advertised by some
|
||||
# hongming-pc-runner-publish-* runners (Windows), whose runner-base
|
||||
# image (`docker-config-fix`) breaks `docker login --password-stdin`
|
||||
# with `Error saving credentials: mkdir /home/hongming: permission
|
||||
# denied` (same EACCES bug class as internal#597/#603 act_runner HOME
|
||||
# injection). op-host molecule-runner-publish-{1,2} are the only
|
||||
# runners advertising BOTH `publish` AND `release` (op-host
|
||||
# /opt/molecule/runners/config.publish.yaml lines 28-29). Requiring
|
||||
# both labels routes publish to op-host deterministically. Matches
|
||||
# template-codex tc#22 (merge 0fb25352).
|
||||
runs-on: [publish, release]
|
||||
timeout-minutes: 30
|
||||
needs: resolve-version
|
||||
steps:
|
||||
|
||||
+21
@@ -119,6 +119,27 @@ COPY scripts/molecule-git-token-helper.sh /app/scripts/molecule-git-token-helper
|
||||
COPY scripts/molecule-gh-token-refresh.sh /app/scripts/molecule-gh-token-refresh.sh
|
||||
RUN chmod +x /app/scripts/molecule-git-token-helper.sh /app/scripts/molecule-gh-token-refresh.sh
|
||||
|
||||
# Generic GIT_ASKPASS helper — image-side companion to molecule-core PR
|
||||
# #1525 (workspace-server applyAgentGitIdentity, merge_sha 73a09443a086).
|
||||
# Reads HTTPS Basic-Auth credentials from env vars (GIT_HTTP_USERNAME /
|
||||
# GIT_HTTP_PASSWORD, with GITEA_USER / GITEA_TOKEN as fallback) and emits
|
||||
# them on the git credential-prompt protocol, so container-side `git` can
|
||||
# authenticate to any private HTTPS remote without on-disk ~/.gitconfig
|
||||
# or ~/.git-credentials mutation. The platform provisioner sets
|
||||
# GIT_ASKPASS=/usr/local/bin/molecule-askpass via applyAgentGitIdentity;
|
||||
# until this binary ships in the runtime image, git invocations error
|
||||
# with "exec: /usr/local/bin/molecule-askpass: not found" (forward-only
|
||||
# pin gap — same class as Hermes list_peers and codex template breakage,
|
||||
# fixed image-side here).
|
||||
#
|
||||
# No hardcoded hostnames or vendor names — the script body is identical
|
||||
# to the one shipped in molecule-core workspace/scripts/molecule-askpass
|
||||
# and the parallel external workspace template repos, so any deployer
|
||||
# can fork this template and use it against their own git host without
|
||||
# editing.
|
||||
COPY scripts/molecule-askpass /usr/local/bin/molecule-askpass
|
||||
RUN chmod +x /usr/local/bin/molecule-askpass
|
||||
|
||||
# Drop-priv entrypoint — claude-code refuses --dangerously-skip-permissions
|
||||
# as root, so we run molecule-runtime as the agent user (uid 1000).
|
||||
# The script handles volume-ownership fix + session-dir symlink before
|
||||
|
||||
+11
-101
@@ -398,79 +398,6 @@ def _format_process_error(exc: BaseException) -> str:
|
||||
return " | ".join(parts)
|
||||
|
||||
|
||||
class ClaudeResultError(Exception):
|
||||
"""The CLI emitted a terminal `result` message with `is_error=true`.
|
||||
|
||||
internal#211/#212 root cause: the `claude` CLI signals provider-side
|
||||
failures (auth, entitlement, quota, upstream HTTP errors) NOT by
|
||||
raising a ProcessError but by emitting a normal `result` stream
|
||||
message with `is_error=true` whose `result`/`error`/`api_error_status`
|
||||
fields carry the human-readable, user-actionable, secret-safe reason
|
||||
(e.g. a 403 "Your organization has disabled Claude subscription
|
||||
access · Use an Anthropic API key instead, or ask your admin to
|
||||
enable access" / error code `oauth_org_not_allowed`).
|
||||
|
||||
Before this class, `_run_query` returned that message body as if it
|
||||
were a successful turn, OR — when `result` was empty and only
|
||||
`errors[]` carried text — the SDK's lossy `str(subtype)` collapsed
|
||||
it to the word "success", which `sanitize_agent_error` then reduced
|
||||
to the opaque "Agent error (Exception)". We now raise this with a
|
||||
pre-curated reason so the error path can surface it verbatim
|
||||
(it is already secret-safe; `sanitize_agent_error` still scrubs).
|
||||
"""
|
||||
|
||||
def __init__(self, reason: str, *, api_error_status: int | None = None,
|
||||
error_code: str | None = None) -> None:
|
||||
self.reason = reason
|
||||
self.api_error_status = api_error_status
|
||||
self.error_code = error_code
|
||||
super().__init__(reason)
|
||||
|
||||
|
||||
def _curate_result_error(message: Any) -> str:
|
||||
"""Build a user-actionable, secret-safe reason from an is_error ResultMessage.
|
||||
|
||||
Pulls the provider's own human message (`result`), the machine error
|
||||
code (`error`), the upstream HTTP status (`api_error_status`), and any
|
||||
`errors[]` list. `api_error_status`/`error` are read via getattr because
|
||||
the pinned claude-agent-sdk dataclass drops them on parse (they survive
|
||||
only if a newer SDK adds the fields) — `result`/`errors` are always
|
||||
populated by the parser and carry the actionable text today.
|
||||
|
||||
None of these fields are secret: an HTTP status, an error code like
|
||||
`oauth_org_not_allowed`, and the provider's own guidance string are
|
||||
exactly what the user must see to self-serve. `sanitize_agent_error`
|
||||
still runs its key/token/bearer scrub over the final string as a
|
||||
belt-and-braces second pass.
|
||||
"""
|
||||
parts: list[str] = []
|
||||
status = getattr(message, "api_error_status", None)
|
||||
code = getattr(message, "error", None)
|
||||
result = getattr(message, "result", None)
|
||||
errors = getattr(message, "errors", None)
|
||||
if status:
|
||||
parts.append(f"provider HTTP {status}")
|
||||
if code and isinstance(code, str):
|
||||
parts.append(code)
|
||||
# The provider's human guidance is the most important bit — prefer
|
||||
# `result`, fall back to joined `errors[]` (the lossy path the SDK
|
||||
# otherwise collapses to the bare subtype word "success").
|
||||
human = None
|
||||
if result and isinstance(result, str) and result.strip():
|
||||
human = result.strip()
|
||||
elif errors:
|
||||
joined = "; ".join(str(e) for e in errors if e)
|
||||
if joined.strip():
|
||||
human = joined.strip()
|
||||
if human:
|
||||
parts.append(human)
|
||||
if not parts:
|
||||
# Last-ditch: never raise a bare "" — keep the subtype so the log
|
||||
# still tells operators which terminal state the CLI reported.
|
||||
parts.append(f"claude CLI reported an error result ({getattr(message, 'subtype', 'unknown')})")
|
||||
return " — ".join(parts)
|
||||
|
||||
|
||||
@dataclass
|
||||
class QueryResult:
|
||||
"""Outcome of a single `query()` stream.
|
||||
@@ -608,7 +535,16 @@ class ClaudeSDKExecutor(AgentExecutor):
|
||||
# claude session renders inbound messages as `<channel>` tags
|
||||
# inline (no inbox poll needed). Drop once channels graduate
|
||||
# to the default allowlist.
|
||||
extra_args={"dangerously-load-development-channels": "server:molecule"},
|
||||
#
|
||||
# Task #214 — CLI 2.1.143 made the flag variadic (nargs='+').
|
||||
# The `{flag: value}` shape renders as TWO argv elements (see
|
||||
# claude_agent_sdk subprocess_cli.py:340) and the channels
|
||||
# parser then greedily absorbs the SDK's downstream `--print
|
||||
# <prompt>` argv pair, wedging the SDK at initialize. Fix:
|
||||
# pack `=value` into the key so the renderer's None-value
|
||||
# path emits a single argv element which the variadic parser
|
||||
# cannot reach across.
|
||||
extra_args={"dangerously-load-development-channels=server:molecule": None},
|
||||
)
|
||||
|
||||
# --- output_config: effort + task_budget (issue #652) ---
|
||||
@@ -678,19 +614,6 @@ class ClaudeSDKExecutor(AgentExecutor):
|
||||
sid = getattr(message, "session_id", None)
|
||||
if sid:
|
||||
session_id = sid
|
||||
# internal#211/#212: a terminal result with is_error=true
|
||||
# is a provider-side failure (auth/entitlement/quota/
|
||||
# upstream HTTP) whose result/error/api_error_status carry
|
||||
# the user-actionable reason. Surface it as a structured
|
||||
# error instead of silently returning the body as a normal
|
||||
# turn (or, when only errors[] is set, letting the SDK
|
||||
# collapse it to the opaque word "success").
|
||||
if getattr(message, "is_error", False):
|
||||
raise ClaudeResultError(
|
||||
_curate_result_error(message),
|
||||
api_error_status=getattr(message, "api_error_status", None),
|
||||
error_code=getattr(message, "error", None),
|
||||
)
|
||||
result_text = getattr(message, "result", None)
|
||||
finally:
|
||||
self._active_stream = None
|
||||
@@ -775,11 +698,6 @@ class ClaudeSDKExecutor(AgentExecutor):
|
||||
def _is_retryable(exc: BaseException) -> bool:
|
||||
"""Check if an SDK exception looks like a transient rate-limit or
|
||||
capacity error that's worth retrying with backoff."""
|
||||
# A terminal CLI is_error result (auth/entitlement/quota/provider
|
||||
# HTTP) is never worth retrying — retrying just delays surfacing
|
||||
# the actionable reason to the user. internal#211/#212.
|
||||
if isinstance(exc, ClaudeResultError):
|
||||
return False
|
||||
msg = str(exc).lower()
|
||||
return any(p in msg for p in _RETRYABLE_PATTERNS)
|
||||
|
||||
@@ -885,15 +803,7 @@ class ClaudeSDKExecutor(AgentExecutor):
|
||||
f"claude_agent_sdk wedge: {formatted[:200]} — restart workspace to recover"
|
||||
)
|
||||
break
|
||||
# internal#211/#212: when the failure is a curated,
|
||||
# secret-safe provider reason (ClaudeResultError), pass
|
||||
# it through to the user instead of collapsing to the
|
||||
# opaque exception class name. sanitize_agent_error
|
||||
# still scrubs key/token/bearer-shaped substrings.
|
||||
if isinstance(exc, ClaudeResultError):
|
||||
response_text = sanitize_agent_error(exc, reason=exc.reason)
|
||||
else:
|
||||
response_text = sanitize_agent_error(exc)
|
||||
response_text = sanitize_agent_error(exc)
|
||||
break
|
||||
finally:
|
||||
await set_current_task(self.heartbeat, "")
|
||||
|
||||
@@ -1,6 +1,16 @@
|
||||
# Molecule AI workspace runtime — shared infrastructure
|
||||
molecule-ai-workspace-runtime>=0.1.22
|
||||
|
||||
# P0 band-aid for canvas-chat upload 400 "failed to parse multipart form"
|
||||
# (task #256; forensic a5bb950f). Starlette `Request.form()` raises
|
||||
# AssertionError parsing multipart bodies when python-multipart is absent.
|
||||
# Pinned in molecule-core mc#1578 (SSOT, MERGED 2026-05-19T21:41Z) but the
|
||||
# PyPI publish of the updated runtime wheel is gated on the Gitea middleman
|
||||
# rename + PyPI abuse-block recovery. This direct pin in each template is
|
||||
# REDUNDANT (and harmless) once mc#1578's runtime tag publishes — at that
|
||||
# point the runtime wheel itself will carry python-multipart as a transitive.
|
||||
python-multipart>=0.0.27
|
||||
|
||||
# Claude Code adapter specific deps
|
||||
# Claude Agent SDK — programmatic API to Claude Code engine.
|
||||
# Replaces CLI subprocess approach (no more --print, --resume, json parsing).
|
||||
|
||||
Executable
+35
@@ -0,0 +1,35 @@
|
||||
#!/bin/sh
|
||||
# git-askpass helper. Reads HTTPS Basic-Auth credentials from env vars so
|
||||
# the deployer can wire git authentication for any private remote without
|
||||
# touching ~/.gitconfig or ~/.git-credentials inside the container.
|
||||
#
|
||||
# Wire-up: set GIT_ASKPASS=/usr/local/bin/molecule-askpass in the
|
||||
# container env, then export GIT_HTTP_USERNAME / GIT_HTTP_PASSWORD (or the
|
||||
# GITEA_USER / GITEA_TOKEN fallback pair). When git encounters an HTTPS
|
||||
# auth challenge on a host that has no credential.helper configured for
|
||||
# it, git invokes GIT_ASKPASS twice — once with a "Username for ..."
|
||||
# prompt and once with a "Password for ..." prompt. We pattern-match on
|
||||
# that prompt and emit the matching env var.
|
||||
#
|
||||
# No hardcoded hostnames or vendor names — the deployer decides which
|
||||
# host these credentials apply to by virtue of setting GIT_ASKPASS only
|
||||
# when the target remote is in scope. The helper itself is reusable for
|
||||
# any HTTPS git remote.
|
||||
#
|
||||
# Failure mode: if the env vars are unset, we emit an empty string and
|
||||
# let git surface "Authentication failed" — this is intentional, so a
|
||||
# misconfigured deployment fails loudly at first push instead of silently
|
||||
# falling through to an unrelated credential chain.
|
||||
|
||||
case "$1" in
|
||||
Username*)
|
||||
printf '%s\n' "${GIT_HTTP_USERNAME:-${GITEA_USER:-}}"
|
||||
;;
|
||||
Password*)
|
||||
printf '%s\n' "${GIT_HTTP_PASSWORD:-${GITEA_TOKEN:-}}"
|
||||
;;
|
||||
*)
|
||||
# Unknown prompt — emit empty and let git decide.
|
||||
printf '\n'
|
||||
;;
|
||||
esac
|
||||
@@ -110,6 +110,18 @@ def _load_executor():
|
||||
return claude_sdk_executor
|
||||
|
||||
|
||||
def _channels_entry(extra_args):
|
||||
"""Return (key, value) for the dev-channels flag, tolerating both shapes.
|
||||
|
||||
- separate-value shape: {"dangerously-load-development-channels": "server:X"}
|
||||
- packed `=` shape (task #214 fix): {"dangerously-load-development-channels=server:X": None}
|
||||
"""
|
||||
for k, v in extra_args.items():
|
||||
if k.split("=", 1)[0] == "dangerously-load-development-channels":
|
||||
return k, v
|
||||
return None, None
|
||||
|
||||
|
||||
def test_build_options_forwards_tagged_dev_channels_flag(tmp_path):
|
||||
"""``_build_options`` must pass the tagged ``server:molecule`` entry to
|
||||
``--dangerously-load-development-channels``. The Claude Code 2.1.x CLI
|
||||
@@ -142,25 +154,28 @@ def test_build_options_forwards_tagged_dev_channels_flag(tmp_path):
|
||||
"extra_args missing — host claude CLI will never see the dev-channels "
|
||||
"flag and notifications/claude/channel will be filtered at the allowlist"
|
||||
)
|
||||
flag_value = kwargs["extra_args"].get("dangerously-load-development-channels")
|
||||
assert flag_value == "server:molecule", (
|
||||
f"dev-channels entry must be tagged 'server:molecule' to match the "
|
||||
f"workspace's MCP-server registration. The CLI rejects bare server "
|
||||
key, value = _channels_entry(kwargs["extra_args"])
|
||||
# Resolve the tagged payload from whichever shape the executor used.
|
||||
tagged = value if value is not None else (key.split("=", 1)[1] if "=" in key else None)
|
||||
assert tagged == "server:molecule", (
|
||||
f"dev-channels entry must resolve to tagged 'server:molecule' to match "
|
||||
f"the workspace's MCP-server registration. The CLI rejects bare server "
|
||||
f"names with `entries must be tagged` and bare-switch values (None) "
|
||||
f"with `argument missing`; the latter wedges SDK initialize. "
|
||||
f"got {flag_value!r}"
|
||||
f"got key={key!r} value={value!r}"
|
||||
)
|
||||
|
||||
|
||||
def test_build_options_dev_channels_value_is_not_bare_none(tmp_path):
|
||||
"""Defense in depth against the original PR #25 bare-switch shape.
|
||||
|
||||
``{flag: None}`` in claude-agent-sdk's extra_args forwarding renders
|
||||
as a bare ``--flag`` with no value, which the post-2.1.x CLI rejects.
|
||||
Pin the invariant (non-None, non-empty, contains a tag colon) so a
|
||||
regression to the old shape fails immediately at unit-test time
|
||||
instead of surfacing as a live `Control request timeout: initialize`
|
||||
wedge in production.
|
||||
A bare ``--dangerously-load-development-channels`` (no value, no
|
||||
``=value`` packed into the key) renders as an argument-less flag,
|
||||
which the post-2.1.x CLI rejects with `argument missing`. Pin the
|
||||
invariant (the rendered payload is non-empty and tag-colon-shaped)
|
||||
so a regression to the old shape fails immediately at unit-test
|
||||
time instead of surfacing as a live `Control request timeout:
|
||||
initialize` wedge in production.
|
||||
"""
|
||||
mod = _load_executor()
|
||||
sdk = sys.modules["claude_agent_sdk"]
|
||||
@@ -174,17 +189,56 @@ def test_build_options_dev_channels_value_is_not_bare_none(tmp_path):
|
||||
)
|
||||
executor._build_options()
|
||||
|
||||
flag_value = (
|
||||
key, value = _channels_entry(
|
||||
sdk.ClaudeAgentOptions.call_args.kwargs["extra_args"]
|
||||
["dangerously-load-development-channels"]
|
||||
)
|
||||
assert flag_value is not None, (
|
||||
"flag value must not be None — bare switch wedges SDK initialize"
|
||||
payload = key if value is None else f"{key}={value}"
|
||||
assert ":" in payload.split("=", 1)[-1], (
|
||||
f"flag payload must be tagged (server:<name> or plugin:<name>@<marketplace>); "
|
||||
f"got key={key!r} value={value!r} which the CLI rejects with "
|
||||
f"`entries must be tagged` or `argument missing`"
|
||||
)
|
||||
assert isinstance(flag_value, str) and flag_value, (
|
||||
f"flag value must be a non-empty string; got {flag_value!r}"
|
||||
|
||||
|
||||
def test_dev_channels_does_not_swallow_print_prompt_cli_2_1_143(tmp_path):
|
||||
"""Task #214 regression — claude-code CLI 2.1.143.
|
||||
|
||||
CLI 2.1.143 made ``--dangerously-load-development-channels`` variadic
|
||||
(``nargs='+'``). claude-agent-sdk's renderer (subprocess_cli.py:340)
|
||||
emits ``{flag: value}`` as TWO argv elements, so the channels parser
|
||||
greedily absorbs the following ``--print <prompt>`` argv pair as
|
||||
channel entries and the SDK wedges at initialize. Fix: pack ``=``
|
||||
into the key so the renderer's ``None``-value path emits ONE argv —
|
||||
``--dangerously-load-development-channels=server:molecule`` — that
|
||||
the variadic parser cannot reach across. Both argv orderings
|
||||
around ``--print <prompt>`` (channels-then-print, print-then-
|
||||
channels) must keep the prompt argv adjacent to ``--print``.
|
||||
"""
|
||||
mod = _load_executor()
|
||||
sdk = sys.modules["claude_agent_sdk"]
|
||||
sdk.ClaudeAgentOptions.reset_mock()
|
||||
executor = mod.ClaudeSDKExecutor(
|
||||
system_prompt=None, config_path=str(tmp_path), heartbeat=None, model="sonnet",
|
||||
)
|
||||
assert ":" in flag_value, (
|
||||
f"flag value must be tagged (server:<name> or plugin:<name>@<marketplace>); "
|
||||
f"got {flag_value!r} which the CLI rejects with `entries must be tagged`"
|
||||
executor._build_options()
|
||||
extra_args = sdk.ClaudeAgentOptions.call_args.kwargs["extra_args"]
|
||||
|
||||
# Mirror claude_agent_sdk/_internal/transport/subprocess_cli.py:340.
|
||||
channels_argv = []
|
||||
for flag, val in extra_args.items():
|
||||
channels_argv.append(f"--{flag}") if val is None else channels_argv.extend([f"--{flag}", str(val)])
|
||||
|
||||
slots = [a for a in channels_argv if a.startswith("--dangerously-load-development-channels")]
|
||||
assert len(slots) == 1 and "=" in slots[0] and channels_argv == slots, (
|
||||
f"channels flag must render as a single argv with `=value` packed in so "
|
||||
f"CLI 2.1.143's nargs='+' parser cannot swallow --print <prompt>; "
|
||||
f"got channels_argv={channels_argv!r}"
|
||||
)
|
||||
for orientation, full_argv in (
|
||||
("channels_then_print", channels_argv + ["--print", "hello world"]),
|
||||
("print_then_channels", ["--print", "hello world"] + channels_argv),
|
||||
):
|
||||
idx = full_argv.index("--print")
|
||||
assert full_argv[idx + 1] == "hello world", (
|
||||
f"--print prompt argv must stay adjacent ({orientation}); got {full_argv!r}"
|
||||
)
|
||||
|
||||
@@ -1,299 +0,0 @@
|
||||
"""internal#211/#212: a terminal `result` message with is_error=true must
|
||||
surface the provider's actionable, secret-safe reason — NOT be returned as
|
||||
a normal turn and NOT collapse to the opaque "Agent error (Exception)".
|
||||
|
||||
Root cause was a two-cut loss:
|
||||
1. claude_sdk_executor._run_query read ResultMessage.result but ignored
|
||||
`is_error`, so a 403 org-disabled result was either returned as if it
|
||||
were a successful answer or (when only errors[] carried text) reduced
|
||||
by the SDK to the bare subtype word "success".
|
||||
2. sanitize_agent_error then reduced whatever exception to its class name.
|
||||
|
||||
These tests pin:
|
||||
- _curate_result_error builds a reason carrying the provider HTTP status,
|
||||
the error code, and the provider's human guidance.
|
||||
- _run_query raises ClaudeResultError (a non-retryable terminal error)
|
||||
when the stream yields a ResultMessage with is_error=true.
|
||||
- The reason is preserved through the executor's sanitize call.
|
||||
- A secret-shaped payload is still scrubbed.
|
||||
|
||||
Regression-injection-checked: reverting the is_error branch in _run_query
|
||||
makes test_run_query_raises_on_is_error fail (no exception raised); reverting
|
||||
the _curate_result_error field reads makes the field-content asserts fail.
|
||||
|
||||
Stub pattern mirrors tests/test_runtime_wedge_mirror.py so the file runs in
|
||||
CI with only `pytest pytest-asyncio pyyaml` installed.
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import os
|
||||
import sys
|
||||
import types
|
||||
from unittest.mock import MagicMock
|
||||
|
||||
import pytest
|
||||
|
||||
|
||||
# ---- Stubs (mirror of test_runtime_wedge_mirror._install_executor_stubs) ----
|
||||
|
||||
|
||||
def _ensure_module(dotted: str) -> types.ModuleType:
|
||||
if dotted not in sys.modules:
|
||||
sys.modules[dotted] = types.ModuleType(dotted)
|
||||
return sys.modules[dotted]
|
||||
|
||||
|
||||
def _ensure_attr(mod: types.ModuleType, name: str, value: object) -> None:
|
||||
# Always override. conftest.py::_install_stubs runs at collection time
|
||||
# and pre-registers bare placeholder stubs (e.g. ResultMessage =
|
||||
# type("ResultMessage", (), {}) which takes no kwargs, and a MagicMock
|
||||
# claude_sdk_executor module). A no-op-if-present helper would let
|
||||
# those win in a full-suite run while passing in isolation. This file
|
||||
# owns the precise stub shapes _run_query/_curate_result_error need,
|
||||
# so it force-installs them; _load_executor() re-imports the real
|
||||
# claude_sdk_executor against these every test.
|
||||
setattr(mod, name, value)
|
||||
|
||||
|
||||
class _StubResultMessage:
|
||||
"""Real class so isinstance(message, sdk.ResultMessage) works in
|
||||
_run_query. Carries the fields the CLI sends on a 403 org-disabled
|
||||
result. api_error_status/error are read via getattr in
|
||||
_curate_result_error so they're optional here too."""
|
||||
|
||||
def __init__(self, *, is_error, result=None, errors=None,
|
||||
api_error_status=None, error=None, subtype="success",
|
||||
session_id="sess-1"):
|
||||
self.is_error = is_error
|
||||
self.result = result
|
||||
self.errors = errors
|
||||
self.api_error_status = api_error_status
|
||||
self.error = error
|
||||
self.subtype = subtype
|
||||
self.session_id = session_id
|
||||
|
||||
|
||||
class _StubAssistantMessage:
|
||||
def __init__(self, content=None):
|
||||
self.content = content or []
|
||||
|
||||
|
||||
class _StubTextBlock:
|
||||
def __init__(self, text):
|
||||
self.text = text
|
||||
|
||||
|
||||
def _install_executor_stubs():
|
||||
sdk = _ensure_module("claude_agent_sdk")
|
||||
_ensure_attr(sdk, "ClaudeAgentOptions", MagicMock(name="ClaudeAgentOptions"))
|
||||
_ensure_attr(sdk, "AssistantMessage", _StubAssistantMessage)
|
||||
_ensure_attr(sdk, "TextBlock", _StubTextBlock)
|
||||
_ensure_attr(sdk, "ResultMessage", _StubResultMessage)
|
||||
_ensure_attr(sdk, "query", MagicMock(name="query"))
|
||||
|
||||
_ensure_module("a2a")
|
||||
_ensure_module("a2a.server")
|
||||
a2a_exec = _ensure_module("a2a.server.agent_execution")
|
||||
_ensure_attr(a2a_exec, "AgentExecutor", type("AgentExecutor", (), {}))
|
||||
_ensure_attr(a2a_exec, "RequestContext", type("RequestContext", (), {}))
|
||||
a2a_events = _ensure_module("a2a.server.events")
|
||||
_ensure_attr(a2a_events, "EventQueue", type("EventQueue", (), {}))
|
||||
a2a_helpers = _ensure_module("a2a.helpers")
|
||||
_ensure_attr(a2a_helpers, "new_text_message", lambda *_a, **_kw: None)
|
||||
|
||||
_ensure_module("molecule_runtime")
|
||||
helpers = _ensure_module("molecule_runtime.executor_helpers")
|
||||
_ensure_attr(helpers, "CONFIG_MOUNT", "/configs")
|
||||
_ensure_attr(helpers, "WORKSPACE_MOUNT", "/workspace")
|
||||
_ensure_attr(helpers, "MEMORY_CONTENT_MAX_CHARS", 10000)
|
||||
_ensure_attr(helpers, "auto_push_hook", lambda *a, **kw: None)
|
||||
_ensure_attr(helpers, "brief_summary", lambda *a, **kw: "")
|
||||
_ensure_attr(helpers, "collect_outbound_files", lambda *a, **kw: [])
|
||||
_ensure_attr(helpers, "commit_memory", lambda *a, **kw: None)
|
||||
_ensure_attr(helpers, "extract_attached_files", lambda *a, **kw: [])
|
||||
_ensure_attr(helpers, "extract_message_text", lambda *a, **kw: "")
|
||||
_ensure_attr(helpers, "get_a2a_instructions", lambda **kw: "")
|
||||
_ensure_attr(helpers, "get_hma_instructions", lambda *a, **kw: "")
|
||||
_ensure_attr(helpers, "get_mcp_server_path", lambda *a, **kw: "/dev/null")
|
||||
_ensure_attr(helpers, "get_system_prompt", lambda *a, **kw: "")
|
||||
_ensure_attr(helpers, "read_delegation_results", lambda *a, **kw: "")
|
||||
_ensure_attr(helpers, "recall_memories", lambda *a, **kw: "")
|
||||
|
||||
# Faithful mirror of molecule-core sanitize_agent_error's reason-path
|
||||
# contract (the real impl lives in the runtime package, not installed
|
||||
# in CI). Surfaces `reason` verbatim and still scrubs sk-/bearer.
|
||||
def _sanitize(exc=None, category=None, stderr=None, reason=None):
|
||||
import re
|
||||
tag = category or (type(exc).__name__ if exc is not None else "unknown")
|
||||
if reason:
|
||||
clean = re.sub(
|
||||
r"(?i)(?:bearer|token|api[_-]?key|sk-)[ :=]+[A-Za-z0-9_/.-]{20,}",
|
||||
"[REDACTED]", reason,
|
||||
)
|
||||
return f"Agent error ({tag}): {clean}"
|
||||
if stderr:
|
||||
return f"Agent error ({tag}): {stderr}"
|
||||
return f"Agent error ({tag}) — see workspace logs for details."
|
||||
|
||||
_ensure_attr(helpers, "sanitize_agent_error", _sanitize)
|
||||
_ensure_attr(helpers, "set_current_task", lambda *a, **kw: None)
|
||||
|
||||
|
||||
def _load_executor():
|
||||
_install_executor_stubs()
|
||||
parent_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
|
||||
if parent_dir not in sys.path:
|
||||
sys.path.insert(0, parent_dir)
|
||||
sys.modules.pop("claude_sdk_executor", None)
|
||||
import claude_sdk_executor # noqa: WPS433
|
||||
return claude_sdk_executor
|
||||
|
||||
|
||||
# The exact payload the CLI emitted on internal#211.
|
||||
_211_RESULT = (
|
||||
"Your organization has disabled Claude subscription access for Claude "
|
||||
"Code · Use an Anthropic API key instead, or ask your admin to enable "
|
||||
"access"
|
||||
)
|
||||
|
||||
|
||||
# ─── _curate_result_error ──────────────────────────────────────────────
|
||||
|
||||
|
||||
def test_curate_includes_status_code_and_human_guidance():
|
||||
cse = _load_executor()
|
||||
msg = cse.sdk.ResultMessage(
|
||||
is_error=True,
|
||||
result=_211_RESULT,
|
||||
errors=[],
|
||||
api_error_status=403,
|
||||
error="oauth_org_not_allowed",
|
||||
subtype="success",
|
||||
)
|
||||
reason = cse._curate_result_error(msg)
|
||||
assert "403" in reason
|
||||
assert "oauth_org_not_allowed" in reason
|
||||
assert "disabled Claude subscription access" in reason
|
||||
assert "ask your admin to enable access" in reason
|
||||
# Must NOT degrade to the bare subtype word.
|
||||
assert reason.strip().lower() != "success"
|
||||
|
||||
|
||||
def test_curate_falls_back_to_errors_list_when_result_empty():
|
||||
"""When the CLI sends errors[] instead of result, that text must still
|
||||
be surfaced (this is the path the SDK otherwise collapses to "success")."""
|
||||
cse = _load_executor()
|
||||
msg = cse.sdk.ResultMessage(
|
||||
is_error=True,
|
||||
result=None,
|
||||
errors=["upstream 503 from provider", "retry later"],
|
||||
subtype="success",
|
||||
)
|
||||
reason = cse._curate_result_error(msg)
|
||||
assert "upstream 503 from provider" in reason
|
||||
assert reason.strip().lower() != "success"
|
||||
|
||||
|
||||
def test_curate_never_returns_empty():
|
||||
cse = _load_executor()
|
||||
msg = cse.sdk.ResultMessage(is_error=True, result=None, errors=None,
|
||||
subtype="error_max_turns")
|
||||
reason = cse._curate_result_error(msg)
|
||||
assert reason.strip()
|
||||
assert "error_max_turns" in reason
|
||||
|
||||
|
||||
# ─── _run_query raises on is_error ──────────────────────────────────────
|
||||
|
||||
|
||||
def _make_executor(cse):
|
||||
"""Build a ClaudeSDKExecutor without running its real __init__ (which
|
||||
needs heartbeat/config wiring). We only exercise _run_query."""
|
||||
ex = object.__new__(cse.ClaudeSDKExecutor)
|
||||
ex._active_stream = None
|
||||
return ex
|
||||
|
||||
|
||||
def test_run_query_raises_on_is_error():
|
||||
cse = _load_executor()
|
||||
err_msg = cse.sdk.ResultMessage(
|
||||
is_error=True,
|
||||
result=_211_RESULT,
|
||||
errors=[],
|
||||
api_error_status=403,
|
||||
error="oauth_org_not_allowed",
|
||||
)
|
||||
|
||||
async def _fake_stream(*_a, **_kw):
|
||||
yield err_msg
|
||||
|
||||
cse.sdk.query = lambda **_kw: _fake_stream()
|
||||
ex = _make_executor(cse)
|
||||
|
||||
with pytest.raises(cse.ClaudeResultError) as ei:
|
||||
asyncio.run(ex._run_query(prompt="hi", options=None))
|
||||
|
||||
exc = ei.value
|
||||
assert exc.api_error_status == 403
|
||||
assert exc.error_code == "oauth_org_not_allowed"
|
||||
assert "disabled Claude subscription access" in exc.reason
|
||||
|
||||
|
||||
def test_run_query_returns_normally_when_not_error():
|
||||
"""A successful ResultMessage path is unchanged — no regression."""
|
||||
cse = _load_executor()
|
||||
ok_msg = cse.sdk.ResultMessage(is_error=False, result="all done",
|
||||
session_id="s-9")
|
||||
|
||||
async def _fake_stream(*_a, **_kw):
|
||||
yield ok_msg
|
||||
|
||||
cse.sdk.query = lambda **_kw: _fake_stream()
|
||||
ex = _make_executor(cse)
|
||||
result = asyncio.run(ex._run_query(prompt="hi", options=None))
|
||||
assert result.text == "all done"
|
||||
assert result.session_id == "s-9"
|
||||
|
||||
|
||||
def test_claude_result_error_is_not_retryable():
|
||||
"""Terminal provider errors must not be retried (would just delay the
|
||||
user seeing the actionable reason 3x backoff later)."""
|
||||
cse = _load_executor()
|
||||
exc = cse.ClaudeResultError("provider HTTP 429 rate limit hit",
|
||||
api_error_status=429)
|
||||
# Even though the text contains 'rate'/'limit'/'429' (retryable
|
||||
# substrings), a ClaudeResultError is terminal.
|
||||
assert cse.ClaudeSDKExecutor._is_retryable(exc) is False
|
||||
|
||||
|
||||
# ─── End-to-end: reason reaches sanitize_agent_error verbatim ───────────
|
||||
|
||||
|
||||
def test_curated_reason_survives_sanitize_and_scrubs_secrets():
|
||||
cse = _load_executor()
|
||||
from molecule_runtime.executor_helpers import sanitize_agent_error
|
||||
|
||||
exc = cse.ClaudeResultError(
|
||||
"provider HTTP 403 — oauth_org_not_allowed — " + _211_RESULT,
|
||||
api_error_status=403,
|
||||
error_code="oauth_org_not_allowed",
|
||||
)
|
||||
out = sanitize_agent_error(exc, reason=exc.reason)
|
||||
assert "403" in out
|
||||
assert "oauth_org_not_allowed" in out
|
||||
assert "ask your admin to enable access" in out
|
||||
assert "see workspace logs" not in out
|
||||
|
||||
# Synthetic Anthropic-shaped key built at runtime via concat so the
|
||||
# required `Secret scan` gate (pattern `sk-ant-[A-Za-z0-9_-]{40,}`)
|
||||
# does not false-positive on a fixture literal. The assembled value is
|
||||
# identical to the old inline literal — the test still proves a real
|
||||
# `sk-ant-…<40+ chars>` token is scrubbed, just without ever putting
|
||||
# the credential-shaped string on a single source line.
|
||||
fake_key = "sk-" + "ant-" + ("DEADBEEF" * 3) + "0123456789abcdef"
|
||||
leaky = cse.ClaudeResultError(
|
||||
"auth failed Authorization: Bearer " + fake_key
|
||||
)
|
||||
scrubbed = sanitize_agent_error(leaky, reason=leaky.reason)
|
||||
assert "[REDACTED]" in scrubbed
|
||||
assert fake_key not in scrubbed
|
||||
Reference in New Issue
Block a user