Compare commits

..

17 Commits

Author SHA1 Message Date
molecule-ai[bot] 33fabdf483 Merge pull request #2897 from Molecule-AI/staging
staging → main: auto-promote 2cb1b26
2026-05-05 10:23:18 -07:00
Hongming Wang abba16beb4 Merge pull request #2883 from Molecule-AI/refactor/a2a-tools-rbac-extract-rfc2873-iter4a
refactor(workspace): extract RBAC helpers from a2a_tools.py (RFC #2873 iter 4a)
2026-05-05 16:59:36 +00:00
Hongming Wang 9c752e0673 Merge pull request #2879 from Molecule-AI/refactor/mcp-cli-split-rfc2873-iter3
refactor(workspace): split mcp_cli.py into focused modules (RFC #2873 iter 3)
2026-05-05 16:58:05 +00:00
Hongming Wang 2cb1b26512 Merge pull request #2895 from Molecule-AI/2872-workspaces-unique-parent-name
test(org-import): tighten idempotency gate AST → discriminate workspaces vs lookalikes (#2872 Imp-1)
2026-05-05 16:03:26 +00:00
Hongming Wang 48d1945269 test(org-import): tighten AST gate to discriminate workspaces vs lookalikes (#2872 Imp-1)
The previous TestCreateWorkspaceTree_CallsLookupBeforeInsert used
bytes.Index("INSERT INTO workspaces"), which prefix-matches
INSERT INTO workspaces_audit, INSERT INTO workspace_secrets, and
INSERT INTO workspace_channels. RFC #2872 cited this as a silent
false-pass mode: a future refactor that adds an audit-table INSERT
literal earlier in source than the real workspaces INSERT would
make the gate point at the wrong target.

Replaces the byte-search with a go/ast walk + a regex that requires
`\s*\(` after `workspaces` — distinguishes the real target from
prefix lookalikes.

Adds three discriminating tests:
- TestWorkspacesInsertRE_RejectsLookalikes — pins the regex against
  9 sql shapes (real, raw-string-literal, audit-shadow, workspace_*
  prefixes, canvas_layouts, UPDATE/SELECT, comments).
- TestGate_FailsWhenLookupAfterInsert — synthesizes Go source where
  the lookup is positioned AFTER the workspaces INSERT, asserts the
  helper returns lookupPos > insertPos (which the production gate
  flags via t.Errorf). Proves the gate isn't vestigial.
- TestGate_IgnoresAuditTableShadow — synthesizes source with an
  audit-table INSERT BEFORE the lookup + real INSERT, asserts the
  tightened regex correctly walks past the shadow and finds the
  real INSERT.

Also extracts findLookupAndWorkspacesInsertPos as a helper so the
gate logic can be exercised against synthetic source, not only
against the real org_import.go.

Memory: feedback_assert_exact_not_substring.md (verify tightened
test FAILS on old code) — TestGate_FailsWhenLookupAfterInsert is
the failing-on-bug-shape proof.

Closes the silent-false-pass mode of #2872 Important-1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 08:32:56 -07:00
molecule-ai[bot] a04a49f7aa Merge pull request #2893 from Molecule-AI/staging
staging → main: auto-promote bbec4cf
2026-05-05 08:23:11 -07:00
Hongming Wang bbec4cfcfb Merge pull request #2886 from Molecule-AI/feat/poll-mode-chat-upload-phase4
test(rfc): poll-mode chat upload — phase 4 real-Postgres integration
2026-05-05 12:13:17 +00:00
molecule-ai[bot] 19c25a9278 Merge pull request #2889 from Molecule-AI/staging
staging → main: auto-promote 529c3f3
2026-05-05 12:09:48 +00:00
Hongming Wang e50799bc29 test(rfc): poll-mode chat upload — phase 4 real-Postgres integration
Phase 4 closes out the rollout — strict-sqlmock unit tests pin which
SQL fires, but they cannot detect bugs that depend on the actual row
state after the SQL runs. Real-Postgres integration tests catch:

  - the Sweep CTE depends on Postgres' make_interval function and
    the table's CHECK constraints; sqlmock would happily accept a
    hand-written SQL literal that Postgres rejects at runtime.
  - the partial idx_pending_uploads_unacked index only catches a
    wrong WHERE predicate at real-query-plan time.
  - subtle predicate drift (e.g. a WHERE clause that filters by
    acked_at IS NOT NULL but uses BETWEEN incorrectly).

Test cases:

  - PutGetAckRoundTrip: the full happy path — Put, Get, MarkFetched,
    Ack, idempotent re-Ack, Get-after-Ack returns ErrNotFound.
  - Sweep_DeletesAckedAfterRetention: row not eligible at retention=1h
    immediately after Ack; deleted at retention=0.
  - Sweep_DeletesExpiredUnacked: backdated expires_at exercises the
    unacked-and-expired branch of the WHERE clause.
  - Sweep_DeletesBothCategoriesInOneCycle: three rows (acked, expired,
    fresh); a single Sweep deletes the first two and leaves the third.
  - PutEnforcesSizeCap: ErrTooLarge above MaxFileBytes.
  - GetIgnoresExpiredAndAcked: Get filters predicate matches expected
    row state in the table.

Run path:
  - locally via the file-header docker incantation.
  - CI runs on every PR/push that touches handlers/** OR migrations/**
    (.github/workflows/handlers-postgres-integration.yml).
2026-05-05 05:04:41 -07:00
Hongming Wang 07839580a0 Merge pull request #2885 from Molecule-AI/feat/poll-mode-chat-upload-phase3
feat(rfc): poll-mode chat upload — phase 3 GC sweep + observability
2026-05-05 05:04:19 -07:00
Hongming Wang 17aec22f9b fix(build): add a2a_tools_rbac to TOP_LEVEL_MODULES drift gate
Iter 4a's new module needs to be in the rewrite list so the wheel
ships its imports prefixed correctly. Caught by 'PR-built wheel +
import smoke'.

Refs RFC #2873 iter 4a.
2026-05-05 05:00:47 -07:00
Hongming Wang 8388144098 fix(build): add iter-3 mcp_* modules to TOP_LEVEL_MODULES drift gate
The iter-3 split created mcp_heartbeat / mcp_inbox_pollers /
mcp_workspace_resolver but the wheel build's drift-gate check at
scripts/build_runtime_package.py:TOP_LEVEL_MODULES wasn't updated.
Without this fix the wheel ships those modules un-rewritten, so
their imports of platform_auth / configs_dir / etc. break at
runtime. Caught by the 'PR-built wheel + import smoke' check.

Refs RFC #2873 iter 3.
2026-05-05 05:00:29 -07:00
Hongming Wang a327d207da feat(rfc): poll-mode chat upload — phase 3 GC sweep + observability
Phase 3 of the poll-mode chat upload rollout. Stack atop Phase 2.

The platform's pending_uploads table grows once-per-uploaded-file with
no built-in cleanup. Phase 1's hard TTL (expires_at default 24h) makes
expired rows un-fetchable but doesn't actually delete them; Phase 1's
ack stamps acked_at but leaves the row indefinitely. Without a sweep
the table grows unbounded across normal traffic.

This PR adds:

- `Storage.Sweep(ctx, ackRetention)` — a single round-trip CTE that
  deletes acked rows past their retention window plus unacked rows
  past expires_at. Returns `(acked, expired)` deletion counts so
  Phase 3 dashboards can spot the stuck-fetch pattern (high expired,
  low acked) vs healthy churn.
- `pendinguploads.StartSweeper(ctx, storage, ackRetention)` —
  background goroutine that calls Sweep every 5 minutes (default).
  Runs once immediately on startup so a platform restart cleans up
  any rows that became eligible while we were down.
- Prometheus counters `molecule_pending_uploads_swept_total` with
  `outcome={acked,expired,error}` labels. Wired into the existing
  `/metrics` endpoint.
- Wired from cmd/server/main.go via supervised.RunWithRecover —
  one transient panic doesn't take the platform down with it.

Defaults:
  - SweepInterval = 5m (matches the dashboard refresh cadence)
  - DefaultAckRetention = 1h (gives the workspace at-least-once retry
    headroom in case it processed but failed to write the file before
    crashing)

Test coverage: 100% on storage_test.go (extended with sweepSQL pin +
six Sweep test cases including negative-retention clamp + zero-retention
immediate-delete + DB error wrapping) and sweeper_test.go (ticker-driven
+ ctx-cancel + nil-storage + transient-error-doesn't-crash + metric
counter assertions).

Closes the third of four phases tracked on the parent RFC; phase 4 is
the staging E2E test.
2026-05-05 05:00:13 -07:00
Hongming Wang 529c3f3922 Merge pull request #2884 from Molecule-AI/feat/phantom-busy-reset-metric-2865-1777976000
feat(metrics): add molecule_phantom_busy_resets_total counter (#2865)
2026-05-05 11:50:28 +00:00
Hongming Wang c778b62202 feat(metrics): add molecule_phantom_busy_resets_total counter (#2865)
Closes #2865 (split-B of the #2669 root-cause stack).

The phantom-busy sweep in workspace-server/internal/scheduler/scheduler.go
already logs each row reset, but no aggregate metric surfaces "how often
is this firing." A regression that causes high reset rates (e.g.
controlplane#481's missing env vars, or future drift in the workspace
runtime's task-lifecycle accounting) only surfaces when users complain.

Fix: counter exposed at /metrics as molecule_phantom_busy_resets_total,
incremented from sweepPhantomBusy after each row whose active_tasks
was reset. Same shape as existing molecule_websocket_connections_active.

Operator-side dashboard: alert when daily phantom-busy reset count
> 0.5% of active workspaces. Today's steady-state is near-zero; any
increase is a regression signal.

Tests:
  - TestTrackPhantomBusyReset_IncrementsCounter
  - TestTrackPhantomBusyReset_RaceFreeUnderConcurrentWrites (50×200
    concurrent writes; tests atomic invariant)
  - TestHandler_ExposesPhantomBusyResetsCounter (asserts HELP + TYPE
    + value lines in Prometheus text format)
  - TestHandler_PhantomBusyResetsZeroByDefault (fresh-process 0
    contract — prevents a future refactor from accidentally dropping
    the metric from /metrics)

Race-detector clean. Vet clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:45:24 -07:00
Hongming Wang 0c461eb9f1 refactor(workspace): extract RBAC helpers from a2a_tools.py to a2a_tools_rbac.py (RFC #2873 iter 4a)
First slice of the a2a_tools.py (991 LOC) split — single-concern module
for the workspace's RBAC + auth-header layer:

  * _ROLE_PERMISSIONS canonical table
  * _get_workspace_tier
  * _check_memory_write_permission
  * _check_memory_read_permission
  * _is_root_workspace
  * _auth_headers_for_heartbeat

a2a_tools.py shrinks from 991 → 915 LOC. Internal call sites (15
references) work unchanged because the bare names are re-imported at
module-level — Python's local-then-module name resolution still
finds them in a2a_tools's namespace, so existing tests'
patch("a2a_tools._foo", …) keeps working.

The RBAC layer can now evolve independently of the 18 tool handlers.
Adding a new role or capability action touches one file, not the
kitchen-sink module.

Tests:
  * 77 existing test_a2a_tools_impl.py pass unchanged.
  * test_a2a_tools_rbac.py adds 28 focused tests:
    - 6 alias drift-gate tests (`_foo is rbac.foo`)
    - 4 get_workspace_tier env+config branches
    - 2 is_root_workspace tier branches
    - 6 check_memory_write_permission roles + override branches
    - 3 check_memory_read_permission scenarios
    - 3 auth_headers_for_heartbeat platform_auth branches
    - 4 ROLE_PERMISSIONS table invariants
  * Direct coverage for the helper module (was previously only
    exercised through 991-LOC tool-handler tests).

Refs RFC #2873.
2026-05-05 04:43:16 -07:00
Hongming Wang 28ef75d25e refactor(workspace): split mcp_cli.py (626 LOC) into focused modules (RFC #2873 iter 3)
Splits the standalone molecule-mcp wrapper into three single-concern
modules per the OSS-shape refactor program:

  * mcp_heartbeat.py — register POST + heartbeat loop + auth-failure
    escalation + inbound-secret persistence
  * mcp_workspace_resolver.py — single + multi-workspace env validation
    + on-disk token-file read + operator-help printer
  * mcp_inbox_pollers.py — activate inbox singleton + spawn one daemon
    poller per workspace

mcp_cli.py becomes a 193-LOC orchestrator: validates env, calls each
module's helpers, hands off to a2a_mcp_server.cli_main. The console-
script entry molecule-mcp = molecule_runtime.mcp_cli:main is preserved.

Back-compat aliases (mcp_cli._build_agent_card, _heartbeat_loop,
_resolve_workspaces, etc.) re-export the new modules' authoritative
functions so existing tests + wheel_smoke.py + any downstream caller
keeps working unchanged. A new test file pins each alias as the
exact same callable (drift gate via `is`).

Tests:
  * 62 existing test_mcp_cli.py + test_mcp_cli_multi_workspace.py
    pass against the split.
  * Two heartbeat-loop persist tests + the auth-escalation caplog
    setup updated to target mcp_heartbeat (the module where the loop
    body now lives) instead of mcp_cli (still works through aliases
    for direct calls, but Python's name resolution inside the loop
    body uses the new module's namespace).
  * test_mcp_cli_split.py adds 11 new tests: alias drift gate +
    inbox-poller single + multi-workspace branches + degraded
    inbox-import logging path (none of those existed before).

Refs RFC #2873.
2026-05-05 04:33:06 -07:00
22 changed files with 2552 additions and 581 deletions
+4
View File
@@ -55,6 +55,7 @@ TOP_LEVEL_MODULES = {
"a2a_executor",
"a2a_mcp_server",
"a2a_tools",
"a2a_tools_rbac",
"adapter_base",
"agent",
"agents_md",
@@ -75,6 +76,9 @@ TOP_LEVEL_MODULES = {
"internal_file_read",
"main",
"mcp_cli",
"mcp_heartbeat",
"mcp_inbox_pollers",
"mcp_workspace_resolver",
"molecule_ai_status",
"not_configured_handler",
"platform_auth",
+9
View File
@@ -19,6 +19,7 @@ import (
"github.com/Molecule-AI/molecule-monorepo/platform/internal/handlers"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/imagewatch"
memwiring "github.com/Molecule-AI/molecule-monorepo/platform/internal/memory/wiring"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/pendinguploads"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/provisioner"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/registry"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/router"
@@ -265,6 +266,14 @@ func main() {
})
}
// Pending-uploads GC sweep — deletes acked rows past their retention
// window plus unacked rows past expires_at. Without this the
// pending_uploads table grows unbounded; even with the 24h hard TTL,
// nothing actually deletes a row, just makes it un-fetchable.
go supervised.RunWithRecover(ctx, "pending-uploads-sweeper", func(c context.Context) {
pendinguploads.StartSweeper(c, pendinguploads.NewPostgres(db.DB), 0)
})
// Provision-timeout sweep — flips workspaces that have been stuck in
// status='provisioning' past the timeout window to 'failed' and emits
// WORKSPACE_PROVISION_TIMEOUT. Without this the UI banner is cosmetic
@@ -73,6 +73,13 @@ func (s *inMemStorage) Get(context.Context, uuid.UUID) (pendinguploads.Record, e
func (s *inMemStorage) MarkFetched(context.Context, uuid.UUID) error { return nil }
func (s *inMemStorage) Ack(context.Context, uuid.UUID) error { return nil }
// Sweep is required by the Storage interface (Phase 3 GC). Not
// exercised by upload-branch tests — the dedicated sweeper_test.go +
// storage_sweep_test.go cover it.
func (s *inMemStorage) Sweep(context.Context, time.Duration) (pendinguploads.SweepResult, error) {
return pendinguploads.SweepResult{}, nil
}
// expectPollDeliveryMode stubs the SELECT delivery_mode lookup that
// uploadPollMode does (separate from the one resolveWorkspaceForwardCreds
// does — this is the new helper introduced for the poll branch).
@@ -1,11 +1,15 @@
package handlers
import (
"bytes"
"context"
"errors"
"go/ast"
"go/parser"
"go/token"
"os"
"path/filepath"
"regexp"
"strconv"
"strings"
"testing"
@@ -119,6 +123,60 @@ func TestLookupExistingChild_DBError_Propagates(t *testing.T) {
}
}
// workspacesInsertRE matches a SQL literal that begins (after optional
// leading whitespace) with `INSERT INTO workspaces` followed by `(` —
// requiring the open-paren rules out lookalikes like
// `INSERT INTO workspaces_audit`, `INSERT INTO workspace_secrets`,
// `INSERT INTO workspace_channels`, `INSERT INTO canvas_layouts`. The
// previous bytes.Index gate accepted `workspaces_audit` as a prefix
// match — see RFC #2872 Important-1 for the silent-false-pass shape.
var workspacesInsertRE = regexp.MustCompile(`(?s)^\s*INSERT\s+INTO\s+workspaces\s*\(`)
// findLookupAndWorkspacesInsertPos walks the AST of `src` and returns
// the source positions of (a) the first call to `lookupExistingChild`
// and (b) the first CallExpr whose argument list contains a STRING
// BasicLit matching workspacesInsertRE. Either may be token.NoPos if
// not found.
//
// Extracted as a helper so the gate logic can be exercised against
// synthetic source — TestGate_FailsWhenLookupAfterInsert below proves
// the gate actually catches the bug shape, not just the happy path.
func findLookupAndWorkspacesInsertPos(t *testing.T, fname string, src []byte) (lookupPos, insertPos token.Pos, fset *token.FileSet) {
t.Helper()
fset = token.NewFileSet()
file, err := parser.ParseFile(fset, fname, src, parser.ParseComments)
if err != nil {
t.Fatalf("parse %s: %v", fname, err)
}
lookupPos, insertPos = token.NoPos, token.NoPos
ast.Inspect(file, func(n ast.Node) bool {
call, ok := n.(*ast.CallExpr)
if !ok {
return true
}
if sel, ok := call.Fun.(*ast.SelectorExpr); ok {
if sel.Sel.Name == "lookupExistingChild" && lookupPos == token.NoPos {
lookupPos = call.Pos()
}
}
for _, arg := range call.Args {
lit, ok := arg.(*ast.BasicLit)
if !ok || lit.Kind != token.STRING {
continue
}
raw := lit.Value
if unq, err := strconv.Unquote(raw); err == nil {
raw = unq
}
if workspacesInsertRE.MatchString(raw) && insertPos == token.NoPos {
insertPos = call.Pos()
}
}
return true
})
return
}
// Source-level guard — pins that org_import.go calls
// h.lookupExistingChild BEFORE its INSERT INTO workspaces.
//
@@ -126,6 +184,11 @@ func TestLookupExistingChild_DBError_Propagates(t *testing.T) {
// (idempotency check before INSERT), not just function names. If a
// future refactor reintroduces the un-checked INSERT (the original
// bug shape that leaked 72 workspaces in 4 days), this test fails.
//
// AST-walk implementation closes the silent-false-pass mode that the
// previous bytes.Index gate had — see workspacesInsertRE comment for
// the failure mode (workspaces_audit / workspace_secrets / etc.
// shadowing the real target via prefix match).
func TestCreateWorkspaceTree_CallsLookupBeforeInsert(t *testing.T) {
wd, err := os.Getwd()
if err != nil {
@@ -135,17 +198,189 @@ func TestCreateWorkspaceTree_CallsLookupBeforeInsert(t *testing.T) {
if err != nil {
t.Fatalf("read org_import.go: %v", err)
}
lookupPos, insertPos, fset := findLookupAndWorkspacesInsertPos(t, "org_import.go", src)
lookupAt := bytes.Index(src, []byte("h.lookupExistingChild("))
insertAt := bytes.Index(src, []byte("INSERT INTO workspaces"))
if lookupAt < 0 {
t.Fatalf("org_import.go missing call to h.lookupExistingChild — idempotency check removed?")
if lookupPos == token.NoPos {
t.Fatalf("AST: no call to lookupExistingChild in org_import.go — idempotency check removed?")
}
if insertAt < 0 {
t.Fatalf("org_import.go missing INSERT INTO workspaces — schema change?")
if insertPos == token.NoPos {
t.Fatalf("AST: no SQL literal matching `^\\s*INSERT INTO workspaces\\s*\\(` in any CallExpr in org_import.go — schema change or rename?")
}
if lookupAt > insertAt {
t.Errorf("h.lookupExistingChild must come BEFORE INSERT INTO workspaces in org_import.go (lookup@%d, insert@%d) — non-idempotent ordering would re-leak under repeat /org/import calls", lookupAt, insertAt)
if lookupPos > insertPos {
t.Errorf("lookupExistingChild call at %s must come BEFORE INSERT INTO workspaces at %s — non-idempotent ordering would re-leak under repeat /org/import calls",
fset.Position(lookupPos), fset.Position(insertPos))
}
}
// TestGate_FailsWhenLookupAfterInsert proves the gate actually catches
// the bug it's named after — running it against synthetic Go source
// where the lookup call is positioned AFTER the workspaces INSERT must
// produce lookupPos > insertPos, which the production gate flags as
// an ERROR. Without this test the gate could regress to "always pass"
// and we wouldn't notice until the bug shipped again.
//
// Per memory feedback_assert_exact_not_substring.md: verify a
// tightened test FAILS on old code before merging.
func TestGate_FailsWhenLookupAfterInsert(t *testing.T) {
const buggySrc = `package handlers
import "context"
type fakeDB struct{}
func (fakeDB) ExecContext(ctx context.Context, sql string, args ...interface{}) {}
type fakeOrgHandler struct{}
func (h *fakeOrgHandler) lookupExistingChild(ctx context.Context, name string, parentID *string) (string, bool, error) {
return "", false, nil
}
func buggyCreate(h *fakeOrgHandler, db fakeDB, ctx context.Context, name string, parentID *string) {
// Bug shape: INSERT runs FIRST, lookup runs AFTER. This is the
// non-idempotent ordering the gate exists to forbid.
db.ExecContext(ctx, ` + "`INSERT INTO workspaces (id, name) VALUES ($1, $2)`" + `, "x", name)
h.lookupExistingChild(ctx, name, parentID)
}
`
lookupPos, insertPos, _ := findLookupAndWorkspacesInsertPos(t, "buggy.go", []byte(buggySrc))
if lookupPos == token.NoPos || insertPos == token.NoPos {
t.Fatalf("synthetic buggy source missing expected nodes (lookupPos=%v insertPos=%v) — helper logic regression", lookupPos, insertPos)
}
if lookupPos < insertPos {
t.Fatalf("synthetic bug shape (lookup AFTER insert) returned lookupPos=%d < insertPos=%d — gate would NOT fire on actual bug, regression!", lookupPos, insertPos)
}
// Implicit: lookupPos > insertPos here, which the production gate
// flags via t.Errorf. This proves the gate is live, not vestigial.
}
// TestGate_IgnoresAuditTableShadow proves the regex tightening
// actually ignores `INSERT INTO workspaces_audit` literals — the
// specific shape #2872 cited as the silent-false-pass failure mode
// for the previous bytes.Index gate.
func TestGate_IgnoresAuditTableShadow(t *testing.T) {
// Synthetic source with audit-table INSERT at line 1 (would be
// position 0 under prefix-match) and lookup + real INSERT at later
// positions. With the tightened regex, the audit literal is
// ignored: insertPos points at the REAL INSERT, lookup precedes it,
// gate passes correctly.
const src = `package handlers
import "context"
type fakeDB struct{}
func (fakeDB) ExecContext(ctx context.Context, sql string, args ...interface{}) {}
type fakeOrgHandler struct{}
func (h *fakeOrgHandler) lookupExistingChild(ctx context.Context, name string, parentID *string) (string, bool, error) {
return "", false, nil
}
func okCreateWithAudit(h *fakeOrgHandler, db fakeDB, ctx context.Context, name string, parentID *string) {
// Audit-table INSERT — should be IGNORED by the tightened regex.
db.ExecContext(ctx, ` + "`INSERT INTO workspaces_audit (id, action) VALUES ($1, $2)`" + `, "x", "create_attempt")
// Lookup BEFORE real INSERT — correct order.
h.lookupExistingChild(ctx, name, parentID)
// Real INSERT.
db.ExecContext(ctx, ` + "`INSERT INTO workspaces (id, name) VALUES ($1, $2)`" + `, "x", name)
}
`
lookupPos, insertPos, fset := findLookupAndWorkspacesInsertPos(t, "shadow.go", []byte(src))
if lookupPos == token.NoPos || insertPos == token.NoPos {
t.Fatalf("expected to find lookup + real INSERT, got lookupPos=%v insertPos=%v", lookupPos, insertPos)
}
// The audit-table INSERT is at line ~16 (column ~20-ish), the
// lookup is at line 19, the real INSERT is at line 21. If the
// regex regressed to prefix-match, insertPos would point at the
// audit literal at line 16, and the gate would falsely fail
// (lookup at 19 > "insert" at 16). With the tightened regex,
// insertPos correctly points at line 21, and the gate passes.
insertLine := fset.Position(insertPos).Line
lookupLine := fset.Position(lookupPos).Line
if insertLine < lookupLine {
t.Errorf("regex regressed: audit shadow at line %d swallowed real INSERT (lookup at line %d). insertPos should point at the real INSERT (line ~21), not the audit literal.",
insertLine, lookupLine)
}
if lookupPos > insertPos {
t.Errorf("synthetic source has lookup at line %d before real INSERT at line %d, gate should pass (lookupPos < insertPos), got lookupPos=%d > insertPos=%d",
lookupLine, insertLine, lookupPos, insertPos)
}
}
// TestWorkspacesInsertRE_RejectsLookalikes pins the regex that
// discriminates the real workspaces INSERT from prefix-matching
// lookalikes. If this regex regresses to a substring match, the
// AST gate above silently false-passes when a future refactor
// shadows the real INSERT with a workspaces_audit / workspace_secrets
// / canvas_layouts literal placed earlier in source.
func TestWorkspacesInsertRE_RejectsLookalikes(t *testing.T) {
cases := []struct {
sql string
want bool
comment string
}{
{"INSERT INTO workspaces (id, name) VALUES ($1, $2)", true, "real target"},
{"\n\t\tINSERT INTO workspaces (id, name)\n\t\tVALUES ($1, $2)", true, "real target with leading whitespace + newlines (raw string literal shape)"},
{"INSERT INTO workspaces_audit (id) VALUES ($1)", false, "underscore-suffix lookalike (the #2872 specific failure mode)"},
{"INSERT INTO workspace_secrets (key, value) VALUES ($1, $2)", false, "prefix without trailing 's' (workspace_*)"},
{"INSERT INTO workspace_channels (id) VALUES ($1)", false, "another workspace_* prefix"},
{"INSERT INTO canvas_layouts (workspace_id, x, y) VALUES ($1, $2, $3)", false, "unrelated table that contains 'workspace' in a column ref"},
{"UPDATE workspaces SET status='running' WHERE id=$1", false, "UPDATE shouldn't match"},
{"SELECT * FROM workspaces WHERE id=$1", false, "SELECT shouldn't match"},
{"-- comment about INSERT INTO workspaces (\nSELECT 1", false, "comment shouldn't match"},
}
for _, c := range cases {
got := workspacesInsertRE.MatchString(c.sql)
if got != c.want {
t.Errorf("workspacesInsertRE.MatchString(%q) = %v, want %v (%s)", c.sql, got, c.want, c.comment)
}
}
}
// Confirm the regex actually matches the literal currently in
// org_import.go. Pins the shape so `gofmt` reflows or trivial edits
// to the SQL string don't silently disable the gate above.
func TestWorkspacesInsertRE_MatchesActualSourceLiteral(t *testing.T) {
wd, err := os.Getwd()
if err != nil {
t.Fatalf("getwd: %v", err)
}
src, err := os.ReadFile(filepath.Join(wd, "org_import.go"))
if err != nil {
t.Fatalf("read org_import.go: %v", err)
}
// Strip backtick strings, find any whose content matches.
// Walk the source via parser.ParseFile to avoid string-search
// drift if the literal is reflowed.
fset := token.NewFileSet()
file, err := parser.ParseFile(fset, filepath.Join(wd, "org_import.go"), src, parser.ParseComments)
if err != nil {
t.Fatalf("parse org_import.go: %v", err)
}
var matched bool
ast.Inspect(file, func(n ast.Node) bool {
lit, ok := n.(*ast.BasicLit)
if !ok || lit.Kind != token.STRING {
return true
}
raw := lit.Value
if unq, err := strconv.Unquote(raw); err == nil {
raw = unq
}
if workspacesInsertRE.MatchString(raw) {
matched = true
}
return true
})
if !matched {
t.Fatalf("no SQL literal in org_import.go matches workspacesInsertRE — gate is dead. Either the INSERT was renamed (update the regex) or the file was restructured (review the gate logic).")
}
// strings.Contains keeps the test informative: if the regex
// stopped matching but the literal source still contains the
// magic phrase, that's a regex-side failure (test the fix above).
if !strings.Contains(string(src), "INSERT INTO workspaces") {
t.Fatalf("org_import.go has no `INSERT INTO workspaces` substring at all — schema change?")
}
}
@@ -0,0 +1,298 @@
//go:build integration
// +build integration
// pending_uploads_integration_test.go — REAL Postgres integration
// tests for the poll-mode chat upload flow (RFC: phases 13).
//
// Run with:
//
// docker run --rm -d --name pg-integration \
// -e POSTGRES_PASSWORD=test -e POSTGRES_DB=molecule \
// -p 55432:5432 postgres:15-alpine
// sleep 4
// psql ... < workspace-server/migrations/20260505100000_pending_uploads.up.sql
// cd workspace-server
// INTEGRATION_DB_URL="postgres://postgres:test@localhost:55432/molecule?sslmode=disable" \
// go test -tags=integration ./internal/handlers/ -run Integration_PendingUploads
//
// CI (.github/workflows/handlers-postgres-integration.yml) runs this on
// every PR that touches workspace-server/internal/handlers/** OR
// workspace-server/migrations/**.
//
// Why these are NOT plain unit tests
// ----------------------------------
// The strict-sqlmock unit tests in storage_test.go pin which SQL
// statements fire — they are fast and let us iterate without a DB. But
// sqlmock CANNOT detect bugs that depend on the actual row state after
// the SQL runs. In particular:
//
// - the WITH … DELETE … RETURNING CTE used by Sweep depends on
// Postgres' `make_interval` function and the table's CHECK
// constraints. sqlmock would happily accept a hand-written SQL
// literal that Postgres rejects at runtime.
// - the partial index `idx_pending_uploads_unacked` (created by the
// Phase 1 migration) only catches a wrong WHERE predicate at real-
// query-plan time.
//
// These tests close those gaps by booting a real Postgres, running the
// production helpers, and SELECTing the row to verify the observable
// state matches the expected outcome.
package handlers
import (
"context"
"database/sql"
"os"
"testing"
"time"
"github.com/google/uuid"
_ "github.com/lib/pq"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/pendinguploads"
)
// integrationDB_PendingUploads opens a connection from $INTEGRATION_DB_URL
// (skipping the test if unset), wipes the pending_uploads table for
// isolation, and registers a Cleanup that closes the connection.
//
// NOT SAFE FOR `t.Parallel()` — each test gets the table to itself.
// Mirrors the integrationDB helper in delegation_ledger_integration_test.go
// but kept separate so each table's wipe step is local to its tests.
func integrationDB_PendingUploads(t *testing.T) *sql.DB {
t.Helper()
url := os.Getenv("INTEGRATION_DB_URL")
if url == "" {
t.Skip("INTEGRATION_DB_URL not set; skipping (local devs: see file header)")
}
conn, err := sql.Open("postgres", url)
if err != nil {
t.Fatalf("open: %v", err)
}
if err := conn.Ping(); err != nil {
t.Fatalf("ping: %v", err)
}
if _, err := conn.ExecContext(context.Background(), `DELETE FROM pending_uploads`); err != nil {
t.Fatalf("cleanup: %v", err)
}
t.Cleanup(func() { conn.Close() })
return conn
}
func TestIntegration_PendingUploads_PutGetAckRoundTrip(t *testing.T) {
conn := integrationDB_PendingUploads(t)
store := pendinguploads.NewPostgres(conn)
ctx := context.Background()
wsID := uuid.New()
fileID, err := store.Put(ctx, wsID, []byte("hello PDF"), "report.pdf", "application/pdf")
if err != nil {
t.Fatalf("Put: %v", err)
}
// Get reads back the row.
rec, err := store.Get(ctx, fileID)
if err != nil {
t.Fatalf("Get: %v", err)
}
if rec.WorkspaceID != wsID {
t.Errorf("workspace_id = %s, want %s", rec.WorkspaceID, wsID)
}
if string(rec.Content) != "hello PDF" {
t.Errorf("content = %q, want %q", rec.Content, "hello PDF")
}
if rec.Filename != "report.pdf" {
t.Errorf("filename = %q, want %q", rec.Filename, "report.pdf")
}
if rec.AckedAt != nil {
t.Errorf("AckedAt should be nil before Ack, got %v", rec.AckedAt)
}
// MarkFetched stamps fetched_at.
if err := store.MarkFetched(ctx, fileID); err != nil {
t.Fatalf("MarkFetched: %v", err)
}
// Re-read to confirm.
rec2, err := store.Get(ctx, fileID)
if err != nil {
t.Fatalf("Get after MarkFetched: %v", err)
}
if rec2.FetchedAt == nil {
t.Errorf("FetchedAt should be set after MarkFetched")
}
// Ack flips acked_at; subsequent Gets return ErrNotFound (acked rows
// are filtered out at the SELECT predicate).
if err := store.Ack(ctx, fileID); err != nil {
t.Fatalf("Ack: %v", err)
}
if _, err := store.Get(ctx, fileID); err != pendinguploads.ErrNotFound {
t.Errorf("Get after Ack: got %v, want ErrNotFound", err)
}
// Idempotent re-ack succeeds.
if err := store.Ack(ctx, fileID); err != nil {
t.Errorf("re-Ack should be idempotent, got %v", err)
}
}
func TestIntegration_PendingUploads_Sweep_DeletesAckedAfterRetention(t *testing.T) {
conn := integrationDB_PendingUploads(t)
store := pendinguploads.NewPostgres(conn)
ctx := context.Background()
wsID := uuid.New()
fid, err := store.Put(ctx, wsID, []byte("data"), "x.txt", "text/plain")
if err != nil {
t.Fatalf("Put: %v", err)
}
if err := store.Ack(ctx, fid); err != nil {
t.Fatalf("Ack: %v", err)
}
// retention=1h, row was acked just now → not yet eligible.
res, err := store.Sweep(ctx, time.Hour)
if err != nil {
t.Fatalf("Sweep(1h): %v", err)
}
if res.Total() != 0 {
t.Errorf("expected 0 deletions yet, got %+v", res)
}
// retention=0 → row IS eligible immediately.
res, err = store.Sweep(ctx, 0)
if err != nil {
t.Fatalf("Sweep(0): %v", err)
}
if res.Acked != 1 || res.Expired != 0 {
t.Errorf("expected acked=1 expired=0, got %+v", res)
}
// Verify row is actually gone — not just un-fetchable.
var n int
if err := conn.QueryRowContext(ctx, `SELECT COUNT(*) FROM pending_uploads WHERE file_id = $1`, fid).Scan(&n); err != nil {
t.Fatalf("count: %v", err)
}
if n != 0 {
t.Errorf("row should be DELETEd, found %d rows", n)
}
}
func TestIntegration_PendingUploads_Sweep_DeletesExpiredUnacked(t *testing.T) {
conn := integrationDB_PendingUploads(t)
store := pendinguploads.NewPostgres(conn)
ctx := context.Background()
wsID := uuid.New()
fid, err := store.Put(ctx, wsID, []byte("data"), "x.txt", "text/plain")
if err != nil {
t.Fatalf("Put: %v", err)
}
// Manually backdate expires_at so the row IS expired. We don't ack,
// so this exercises the unacked-and-expired branch of the WHERE
// clause specifically.
if _, err := conn.ExecContext(ctx,
`UPDATE pending_uploads SET expires_at = now() - interval '1 minute' WHERE file_id = $1`,
fid,
); err != nil {
t.Fatalf("backdate: %v", err)
}
res, err := store.Sweep(ctx, time.Hour)
if err != nil {
t.Fatalf("Sweep: %v", err)
}
if res.Acked != 0 || res.Expired != 1 {
t.Errorf("expected acked=0 expired=1, got %+v", res)
}
}
func TestIntegration_PendingUploads_Sweep_DeletesBothCategoriesInOneCycle(t *testing.T) {
conn := integrationDB_PendingUploads(t)
store := pendinguploads.NewPostgres(conn)
ctx := context.Background()
wsID := uuid.New()
// Three rows: one acked (eligible at retention=0), one expired
// unacked, one fresh unacked (must NOT be deleted).
ackedFID, err := store.Put(ctx, wsID, []byte("acked"), "a.txt", "text/plain")
if err != nil {
t.Fatalf("Put acked: %v", err)
}
if err := store.Ack(ctx, ackedFID); err != nil {
t.Fatalf("Ack: %v", err)
}
expiredFID, err := store.Put(ctx, wsID, []byte("expired"), "e.txt", "text/plain")
if err != nil {
t.Fatalf("Put expired: %v", err)
}
if _, err := conn.ExecContext(ctx,
`UPDATE pending_uploads SET expires_at = now() - interval '1 minute' WHERE file_id = $1`,
expiredFID,
); err != nil {
t.Fatalf("backdate: %v", err)
}
freshFID, err := store.Put(ctx, wsID, []byte("fresh"), "f.txt", "text/plain")
if err != nil {
t.Fatalf("Put fresh: %v", err)
}
res, err := store.Sweep(ctx, 0) // retention=0 makes the acked row eligible
if err != nil {
t.Fatalf("Sweep: %v", err)
}
if res.Acked != 1 || res.Expired != 1 {
t.Errorf("expected acked=1 expired=1, got %+v", res)
}
// Fresh row survives.
rec, err := store.Get(ctx, freshFID)
if err != nil {
t.Errorf("fresh row should still be Get-able, got err=%v", err)
}
if rec.FileID != freshFID {
t.Errorf("fresh row file_id = %s, want %s", rec.FileID, freshFID)
}
}
func TestIntegration_PendingUploads_PutEnforcesSizeCap(t *testing.T) {
conn := integrationDB_PendingUploads(t)
store := pendinguploads.NewPostgres(conn)
ctx := context.Background()
wsID := uuid.New()
tooBig := make([]byte, pendinguploads.MaxFileBytes+1)
if _, err := store.Put(ctx, wsID, tooBig, "big.bin", "application/octet-stream"); err != pendinguploads.ErrTooLarge {
t.Errorf("expected ErrTooLarge, got %v", err)
}
}
func TestIntegration_PendingUploads_GetIgnoresExpiredAndAcked(t *testing.T) {
conn := integrationDB_PendingUploads(t)
store := pendinguploads.NewPostgres(conn)
ctx := context.Background()
wsID := uuid.New()
fid, err := store.Put(ctx, wsID, []byte("data"), "x.txt", "text/plain")
if err != nil {
t.Fatalf("Put: %v", err)
}
// Backdate expires_at — Get must return ErrNotFound, even though the
// row physically exists in the table (Sweep hasn't run).
if _, err := conn.ExecContext(ctx,
`UPDATE pending_uploads SET expires_at = now() - interval '1 minute' WHERE file_id = $1`,
fid,
); err != nil {
t.Fatalf("backdate: %v", err)
}
if _, err := store.Get(ctx, fid); err != pendinguploads.ErrNotFound {
t.Errorf("Get after expiry: got %v, want ErrNotFound", err)
}
}
@@ -71,6 +71,12 @@ func (f *fakeStorage) Ack(_ context.Context, fileID uuid.UUID) error {
return nil
}
// Sweep is required by the Storage interface (Phase 3 GC). Not exercised
// by these handler tests — the dedicated sweeper_test.go covers it.
func (f *fakeStorage) Sweep(_ context.Context, _ time.Duration) (pendinguploads.SweepResult, error) {
return pendinguploads.SweepResult{}, nil
}
func newRouter(handler *handlers.PendingUploadsHandler) *gin.Engine {
gin.SetMode(gin.TestMode)
r := gin.New()
+74 -8
View File
@@ -5,14 +5,15 @@
//
// Exposed metrics:
//
// molecule_http_requests_total{method,path,status} - counter
// molecule_http_request_duration_seconds{method,path} - counter (sum, for avg rate)
// molecule_websocket_connections_active - gauge
// go_goroutines - gauge
// go_memstats_alloc_bytes - gauge
// go_memstats_sys_bytes - gauge
// go_memstats_heap_inuse_bytes - gauge
// go_gc_duration_seconds_total - counter
// molecule_http_requests_total{method,path,status} - counter
// molecule_http_request_duration_seconds{method,path} - counter (sum, for avg rate)
// molecule_websocket_connections_active - gauge
// molecule_pending_uploads_swept_total{outcome} - counter (acked|expired|error)
// go_goroutines - gauge
// go_memstats_alloc_bytes - gauge
// go_memstats_sys_bytes - gauge
// go_memstats_heap_inuse_bytes - gauge
// go_gc_duration_seconds_total - counter
package metrics
import (
@@ -38,6 +39,12 @@ var (
reqCounts = map[reqKey]int64{} // molecule_http_requests_total
reqDurSums = map[reqKey]float64{} // sum of durations (seconds)
activeWSConns int64 // molecule_websocket_connections_active
// pendinguploads sweeper counters — atomic so the sweeper goroutine
// doesn't contend with the /metrics handler.
pendingUploadsSweptAcked int64 // molecule_pending_uploads_swept_total{outcome="acked"}
pendingUploadsSweptExpired int64 // molecule_pending_uploads_swept_total{outcome="expired"}
pendingUploadsSweepErrors int64 // molecule_pending_uploads_swept_total{outcome="error"}
)
// Middleware records per-request counts and latency.
@@ -76,6 +83,50 @@ func TrackWSConnect() { atomic.AddInt64(&activeWSConns, 1) }
// Call from the WebSocket disconnect / cleanup path.
func TrackWSDisconnect() { atomic.AddInt64(&activeWSConns, -1) }
// phantomBusyResets is the cumulative count of workspace rows the
// phantom-busy sweep reset (active_tasks=0 → active_tasks=0+counter
// cleared). Surfaced as molecule_phantom_busy_resets_total — a high
// reset rate signals a regression in task-lifecycle accounting (most
// often: missing env vars cause claude --print to time out, the
// agent loop never decrements active_tasks, and the sweep cleans up
// the counter ~10 min later). Issue #2865.
var phantomBusyResets int64
// TrackPhantomBusyReset increments the phantom-busy reset counter.
// Called from sweepPhantomBusy in workspace-server/internal/scheduler/
// after each row whose active_tasks was reset to 0. Idempotent +
// goroutine-safe; called once per row per sweep tick.
func TrackPhantomBusyReset() { atomic.AddInt64(&phantomBusyResets, 1) }
// PendingUploadsSwept records a successful sweep cycle. acked/expired
// are added to the per-outcome counters so dashboards can spot the
// stuck-fetch pattern (high expired, low acked) vs healthy churn.
func PendingUploadsSwept(acked, expired int) {
if acked > 0 {
atomic.AddInt64(&pendingUploadsSweptAcked, int64(acked))
}
if expired > 0 {
atomic.AddInt64(&pendingUploadsSweptExpired, int64(expired))
}
}
// PendingUploadsSweepError records a sweeper-cycle failure (transient
// DB error etc). Counted separately so the rate of errored sweeps is
// observable independent of how many rows the successful sweeps deleted.
func PendingUploadsSweepError() {
atomic.AddInt64(&pendingUploadsSweepErrors, 1)
}
// PendingUploadsSweepCounts returns the current (acked, expired, error)
// totals. Exposed for tests that need a deterministic delta probe of
// the sweeper's metric writes — the /metrics endpoint is the production
// observability surface; this is a unit-test escape hatch.
func PendingUploadsSweepCounts() (acked, expired, errored int64) {
return atomic.LoadInt64(&pendingUploadsSweptAcked),
atomic.LoadInt64(&pendingUploadsSweptExpired),
atomic.LoadInt64(&pendingUploadsSweepErrors)
}
// Handler returns a Gin handler that serialises all collected metrics in
// Prometheus text exposition format (v0.0.4). Mount this at GET /metrics.
func Handler() gin.HandlerFunc {
@@ -144,6 +195,21 @@ func Handler() gin.HandlerFunc {
writeln(w, "# HELP molecule_websocket_connections_active Number of active WebSocket connections.")
writeln(w, "# TYPE molecule_websocket_connections_active gauge")
fmt.Fprintf(w, "molecule_websocket_connections_active %d\n", atomic.LoadInt64(&activeWSConns))
// ── Molecule AI scheduler ──────────────────────────────────────────────
writeln(w, "# HELP molecule_phantom_busy_resets_total Cumulative count of workspace rows reset by the phantom-busy sweep (active_tasks cleared after >10 min of activity_log silence). High reset rate signals task-lifecycle accounting regressions — see issue #2865.")
writeln(w, "# TYPE molecule_phantom_busy_resets_total counter")
fmt.Fprintf(w, "molecule_phantom_busy_resets_total %d\n", atomic.LoadInt64(&phantomBusyResets))
// ── Pending-uploads sweeper ────────────────────────────────────────────
writeln(w, "# HELP molecule_pending_uploads_swept_total Pending-uploads rows deleted by the GC sweeper, by outcome.")
writeln(w, "# TYPE molecule_pending_uploads_swept_total counter")
fmt.Fprintf(w, "molecule_pending_uploads_swept_total{outcome=\"acked\"} %d\n",
atomic.LoadInt64(&pendingUploadsSweptAcked))
fmt.Fprintf(w, "molecule_pending_uploads_swept_total{outcome=\"expired\"} %d\n",
atomic.LoadInt64(&pendingUploadsSweptExpired))
fmt.Fprintf(w, "molecule_pending_uploads_swept_total{outcome=\"error\"} %d\n",
atomic.LoadInt64(&pendingUploadsSweepErrors))
}
}
@@ -0,0 +1,104 @@
package metrics
// Tests for the phantom-busy reset counter wired up by issue #2865.
// The counter is exposed at /metrics as
// molecule_phantom_busy_resets_total. A high steady-state value
// signals task-lifecycle accounting regressions in the agent loop —
// see scheduler.sweepPhantomBusy for the writer.
import (
"net/http/httptest"
"strings"
"sync"
"sync/atomic"
"testing"
"github.com/gin-gonic/gin"
)
// resetForTest zeroes the counter so a single test's TrackPhantomBusyReset
// calls don't compound onto a previous test's run. metrics.go's package-
// level state means every test that touches the counter must reset.
func resetForTest() {
atomic.StoreInt64(&phantomBusyResets, 0)
}
func TestTrackPhantomBusyReset_IncrementsCounter(t *testing.T) {
resetForTest()
for i := 0; i < 7; i++ {
TrackPhantomBusyReset()
}
got := atomic.LoadInt64(&phantomBusyResets)
if got != 7 {
t.Errorf("counter after 7 calls = %d, want 7", got)
}
}
func TestTrackPhantomBusyReset_RaceFreeUnderConcurrentWrites(t *testing.T) {
resetForTest()
var wg sync.WaitGroup
const goroutines = 50
const callsPerGoroutine = 200
wg.Add(goroutines)
for i := 0; i < goroutines; i++ {
go func() {
defer wg.Done()
for j := 0; j < callsPerGoroutine; j++ {
TrackPhantomBusyReset()
}
}()
}
wg.Wait()
want := int64(goroutines * callsPerGoroutine)
got := atomic.LoadInt64(&phantomBusyResets)
if got != want {
t.Errorf("counter under concurrent writes = %d, want %d (lost increments → atomic broken)",
got, want)
}
}
func TestHandler_ExposesPhantomBusyResetsCounter(t *testing.T) {
resetForTest()
for i := 0; i < 3; i++ {
TrackPhantomBusyReset()
}
gin.SetMode(gin.TestMode)
r := gin.New()
r.GET("/metrics", Handler())
w := httptest.NewRecorder()
req := httptest.NewRequest("GET", "/metrics", nil)
r.ServeHTTP(w, req)
body := w.Body.String()
// HELP + TYPE lines must precede the metric (Prometheus text exposition format).
if !strings.Contains(body, "# HELP molecule_phantom_busy_resets_total") {
t.Errorf("metrics output missing HELP line for molecule_phantom_busy_resets_total:\n%s", body)
}
if !strings.Contains(body, "# TYPE molecule_phantom_busy_resets_total counter") {
t.Errorf("metrics output missing TYPE line for molecule_phantom_busy_resets_total:\n%s", body)
}
if !strings.Contains(body, "molecule_phantom_busy_resets_total 3\n") {
t.Errorf("metrics output missing counter value 3:\n%s", body)
}
}
func TestHandler_PhantomBusyResetsZeroByDefault(t *testing.T) {
// Fresh process should report 0 — pin the contract so a future
// refactor that lazy-inits the counter to nil doesn't silently
// drop the metric from /metrics.
resetForTest()
gin.SetMode(gin.TestMode)
r := gin.New()
r.GET("/metrics", Handler())
w := httptest.NewRecorder()
req := httptest.NewRequest("GET", "/metrics", nil)
r.ServeHTTP(w, req)
if !strings.Contains(w.Body.String(), "molecule_phantom_busy_resets_total 0\n") {
t.Errorf("metric must report 0 by default:\n%s", w.Body.String())
}
}
@@ -72,6 +72,19 @@ type Record struct {
ExpiresAt time.Time
}
// SweepResult is the per-cycle accounting from Sweep. Both counts are
// non-negative; Total is just Acked + Expired for log/metrics
// convenience. Phase 3 metrics expose these as separate counters so
// dashboards can spot a stuck-ack pattern (high Expired, low Acked) vs.
// healthy churn (Acked dominates).
type SweepResult struct {
Acked int // rows deleted because acked_at + retention elapsed
Expired int // rows deleted because expires_at < now AND never acked
}
// Total returns the sum of Acked + Expired — convenient for log lines.
func (r SweepResult) Total() int { return r.Acked + r.Expired }
// Storage is the platform-side persistence boundary for poll-mode chat
// uploads. The Postgres implementation backs all callers today; an S3-
// backed implementation can drop in once RFC #2789 lands by making
@@ -103,6 +116,18 @@ type Storage interface {
// absent or already expired; on already-acked, returns nil so
// the workspace's at-least-once retry succeeds without an error.
Ack(ctx context.Context, fileID uuid.UUID) error
// Sweep deletes rows past their retention window:
// - acked rows older than ackRetention (give the workspace a
// window to re-fetch in case it processed but failed to write
// the file before crashing — at-least-once behavior).
// - unacked rows past expires_at (the platform's hard TTL — 24h
// by default; a workspace that hasn't fetched by then is
// considered dead from the upload's perspective).
// Returns the per-category deletion counts for observability.
// Errors are surfaced to the caller; a transient DB error must NOT
// crash the sweeper loop (it just retries on the next tick).
Sweep(ctx context.Context, ackRetention time.Duration) (SweepResult, error)
}
// PostgresStorage is the production Storage implementation backed by
@@ -251,3 +276,41 @@ func (p *PostgresStorage) Ack(ctx context.Context, fileID uuid.UUID) error {
// the workspace's intent ("I'm done with this file") was honored.
return nil
}
// Sweep deletes acked rows past their retention window plus any
// unacked rows whose hard TTL has elapsed. Single round-trip: a CTE
// captures the deletion in one DELETE … RETURNING and the outer
// SELECT sums by category. Cheaper and tighter than two round trips,
// and atomic w.r.t. concurrent writes (the WHERE predicate sees a
// consistent snapshot via Postgres MVCC).
//
// ackRetention=0 deletes all acked rows immediately; values <0 are
// clamped to 0 for safety. Caller defaults are documented at
// StartSweeper's DefaultAckRetention.
func (p *PostgresStorage) Sweep(ctx context.Context, ackRetention time.Duration) (SweepResult, error) {
if ackRetention < 0 {
ackRetention = 0
}
// make_interval expects integer seconds — Postgres accepts a
// floating point but we deliberately round to the nearest second
// so test fixtures pin a deterministic value across PG versions.
retentionSecs := int64(ackRetention.Seconds())
var acked, expired int
err := p.db.QueryRowContext(ctx, `
WITH deleted AS (
DELETE FROM pending_uploads
WHERE (acked_at IS NOT NULL AND acked_at < now() - make_interval(secs => $1))
OR (acked_at IS NULL AND expires_at < now())
RETURNING (acked_at IS NOT NULL) AS was_acked
)
SELECT
COALESCE(SUM(CASE WHEN was_acked THEN 1 ELSE 0 END), 0)::int AS acked,
COALESCE(SUM(CASE WHEN NOT was_acked THEN 1 ELSE 0 END), 0)::int AS expired
FROM deleted
`, retentionSecs).Scan(&acked, &expired)
if err != nil {
return SweepResult{}, fmt.Errorf("pendinguploads: sweep: %w", err)
}
return SweepResult{Acked: acked, Expired: expired}, nil
}
@@ -71,6 +71,18 @@ const (
SELECT acked_at FROM pending_uploads
WHERE file_id = $1 AND expires_at > now()
`
sweepSQL = `
WITH deleted AS (
DELETE FROM pending_uploads
WHERE (acked_at IS NOT NULL AND acked_at < now() - make_interval(secs => $1))
OR (acked_at IS NULL AND expires_at < now())
RETURNING (acked_at IS NOT NULL) AS was_acked
)
SELECT
COALESCE(SUM(CASE WHEN was_acked THEN 1 ELSE 0 END), 0)::int AS acked,
COALESCE(SUM(CASE WHEN NOT was_acked THEN 1 ELSE 0 END), 0)::int AS expired
FROM deleted
`
)
// ----- Put ------------------------------------------------------------------
@@ -398,3 +410,104 @@ func TestAck_DBErrorOnDisambiguate_Wrapped(t *testing.T) {
t.Fatalf("expected wrapped disambiguate error, got %v", err)
}
}
// ----- Sweep ----------------------------------------------------------------
func TestSweep_DeletesAckedAndExpired_ReturnsCounts(t *testing.T) {
db, mock := newMockDB(t)
store := pendinguploads.NewPostgres(db)
mock.ExpectQuery(sweepSQL).
WithArgs(int64(3600)). // 1h retention
WillReturnRows(sqlmock.NewRows([]string{"acked", "expired"}).AddRow(7, 2))
res, err := store.Sweep(context.Background(), time.Hour)
if err != nil {
t.Fatalf("Sweep: %v", err)
}
if res.Acked != 7 || res.Expired != 2 || res.Total() != 9 {
t.Errorf("got %+v want acked=7 expired=2 total=9", res)
}
}
func TestSweep_NothingToDelete_ReturnsZero(t *testing.T) {
db, mock := newMockDB(t)
store := pendinguploads.NewPostgres(db)
mock.ExpectQuery(sweepSQL).
WithArgs(int64(3600)).
WillReturnRows(sqlmock.NewRows([]string{"acked", "expired"}).AddRow(0, 0))
res, err := store.Sweep(context.Background(), time.Hour)
if err != nil {
t.Fatalf("Sweep: %v", err)
}
if res.Total() != 0 {
t.Errorf("got %+v, want zero result", res)
}
}
func TestSweep_NegativeRetentionClampedToZero(t *testing.T) {
db, mock := newMockDB(t)
store := pendinguploads.NewPostgres(db)
// Negative retention must clamp to 0; the SQL gets `secs => 0` so an
// acked-just-now row is eligible for deletion immediately. Pinned
// here because passing the raw negative through `make_interval` would
// silently shift acked_at → future and effectively retain rows
// forever — exactly the wrong behavior for a "delete more aggressively"
// caller.
mock.ExpectQuery(sweepSQL).
WithArgs(int64(0)).
WillReturnRows(sqlmock.NewRows([]string{"acked", "expired"}).AddRow(3, 0))
res, err := store.Sweep(context.Background(), -1*time.Second)
if err != nil {
t.Fatalf("Sweep: %v", err)
}
if res.Acked != 3 {
t.Errorf("got %+v want acked=3", res)
}
}
func TestSweep_ZeroRetentionImmediatelyDeletesAcked(t *testing.T) {
db, mock := newMockDB(t)
store := pendinguploads.NewPostgres(db)
mock.ExpectQuery(sweepSQL).
WithArgs(int64(0)).
WillReturnRows(sqlmock.NewRows([]string{"acked", "expired"}).AddRow(5, 1))
res, err := store.Sweep(context.Background(), 0)
if err != nil {
t.Fatalf("Sweep: %v", err)
}
if res.Acked != 5 || res.Expired != 1 {
t.Errorf("got %+v want acked=5 expired=1", res)
}
}
func TestSweep_DBError_Wrapped(t *testing.T) {
db, mock := newMockDB(t)
store := pendinguploads.NewPostgres(db)
mock.ExpectQuery(sweepSQL).
WithArgs(int64(60)).
WillReturnError(errors.New("connection lost"))
_, err := store.Sweep(context.Background(), time.Minute)
if err == nil || !strings.Contains(err.Error(), "sweep") {
t.Fatalf("expected wrapped sweep error, got %v", err)
}
}
func TestSweepResult_TotalSumsCounts(t *testing.T) {
r := pendinguploads.SweepResult{Acked: 4, Expired: 3}
if r.Total() != 7 {
t.Errorf("Total = %d, want 7", r.Total())
}
z := pendinguploads.SweepResult{}
if z.Total() != 0 {
t.Errorf("zero Total = %d, want 0", z.Total())
}
}
@@ -0,0 +1,129 @@
// sweeper.go — periodic GC for the pending_uploads table.
//
// The platform's poll-mode chat-upload handler creates a row in
// pending_uploads for every chat-attached file the canvas sends to a
// poll-mode workspace. The workspace's inbox poller fetches the bytes
// and acks the row, but two failure modes leak rows long-term:
//
// 1. Workspace fetches but never acks (network hiccup between GET
// /content and POST /ack; workspace crashed between the two).
// Phase 1's Get refuses to re-serve an acked row, but a never-
// acked row could in principle be fetched repeatedly until expires_at.
// Phase 2's workspace-side fetcher is idempotent; the worry is
// only disk usage on the platform side.
//
// 2. Workspace never fetches at all (workspace was offline when the
// row was written; the upload's TTL elapsed).
//
// This sweeper handles both. It runs every SweepInterval, deletes rows
// in either category, and emits structured logs + Prometheus counters
// so a stuck-fetch dashboard can spot the leak class.
//
// Failure isolation: a transient DB error must NOT crash the sweeper.
// We log + continue; the next tick retries. ctx cancellation cleanly
// shuts the loop down for graceful shutdown.
package pendinguploads
import (
"context"
"log"
"time"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/metrics"
)
// SweepInterval is the cadence of the GC loop. 5 minutes is a balance
// between "rows reaped quickly enough that disk usage doesn't surprise
// anyone" and "we don't pay a DELETE round-trip every 30 seconds when
// there are no candidates." Aligned with other low-priority sweepers
// (registry/orphan_sweeper runs at 60s but operates on Docker — much
// more expensive per cycle than a single indexed DELETE).
const SweepInterval = 5 * time.Minute
// DefaultAckRetention is how long an acked row sticks around before the
// sweeper deletes it. 1 hour gives the workspace enough time to retry
// the GET if its first fetch crashed mid-write — at-least-once handoff
// without leaking content for a full 24h after the workspace already
// has a copy.
const DefaultAckRetention = 1 * time.Hour
// sweepDeadline bounds a single sweep cycle. A daemon at the edge of
// timeout shouldn't pile up goroutines; 30s is generous for a single
// indexed DELETE on a table that should rarely have more than a few
// thousand rows in flight.
const sweepDeadline = 30 * time.Second
// StartSweeper runs the GC loop until ctx is cancelled. nil storage
// makes the loop a no-op (matches the handlers' tolerance for an
// unconfigured pendinguploads — some test harnesses run without the
// storage wired).
//
// Pass ackRetention=0 to use DefaultAckRetention. Negative values are
// clamped at the storage layer.
//
// Production callers use SweepInterval (5m). Tests use a short interval
// to exercise the ticker-driven sweep path without burning real wall-
// clock time.
func StartSweeper(ctx context.Context, storage Storage, ackRetention time.Duration) {
StartSweeperWithInterval(ctx, storage, ackRetention, SweepInterval)
}
// StartSweeperWithInterval is the test-friendly variant of StartSweeper
// — same loop, but the cadence is caller-specified. Production code
// should use StartSweeper to keep the SweepInterval constant pinned.
func StartSweeperWithInterval(ctx context.Context, storage Storage, ackRetention, interval time.Duration) {
if storage == nil {
log.Println("pendinguploads sweeper: storage is nil — sweeper disabled")
return
}
if ackRetention == 0 {
ackRetention = DefaultAckRetention
}
log.Printf(
"pendinguploads sweeper started — sweeping every %s; ack retention %s",
interval, ackRetention,
)
ticker := time.NewTicker(interval)
defer ticker.Stop()
// Run once immediately so a platform restart cleans up any rows
// that became eligible while we were down — don't make the
// operator wait 5 minutes for the first sweep.
sweepOnce(ctx, storage, ackRetention)
for {
select {
case <-ctx.Done():
log.Println("pendinguploads sweeper: shutdown")
return
case <-ticker.C:
sweepOnce(ctx, storage, ackRetention)
}
}
}
func sweepOnce(parent context.Context, storage Storage, ackRetention time.Duration) {
ctx, cancel := context.WithTimeout(parent, sweepDeadline)
defer cancel()
res, err := storage.Sweep(ctx, ackRetention)
if err != nil {
// Transient errors: log + continue. The next tick retries; if
// the DB is genuinely down, the rest of the platform is also
// broken and disk usage is the least of the operator's
// problems.
log.Printf("pendinguploads sweeper: Sweep failed: %v", err)
metrics.PendingUploadsSweepError()
return
}
metrics.PendingUploadsSwept(res.Acked, res.Expired)
if res.Total() > 0 {
// Per-cycle structured-ish log (one line per cycle that did
// something). Quiet by design — most cycles delete zero rows
// on a healthy system, and a stream of empty-result lines
// would drown the production log without surfacing a signal.
log.Printf(
"pendinguploads sweeper: deleted acked=%d expired=%d total=%d",
res.Acked, res.Expired, res.Total(),
)
}
}
@@ -0,0 +1,250 @@
package pendinguploads_test
import (
"context"
"errors"
"sync/atomic"
"testing"
"time"
"github.com/google/uuid"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/metrics"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/pendinguploads"
)
// fakeSweepStorage is a minimal Storage that records every Sweep call
// and lets each test inject the per-cycle return values. The other
// methods are no-ops — the sweeper goroutine never calls them.
type fakeSweepStorage struct {
calls atomic.Int64
results []pendinguploads.SweepResult
errs []error
cycleDone chan struct{} // closed after each Sweep call (test sync)
gotRetention atomic.Int64 // last ackRetention seen, in seconds
}
func newFakeSweepStorage(results []pendinguploads.SweepResult, errs []error) *fakeSweepStorage {
return &fakeSweepStorage{
results: results,
errs: errs,
cycleDone: make(chan struct{}, 16),
}
}
func (f *fakeSweepStorage) Put(_ context.Context, _ uuid.UUID, _ []byte, _, _ string) (uuid.UUID, error) {
return uuid.Nil, errors.New("not used")
}
func (f *fakeSweepStorage) Get(_ context.Context, _ uuid.UUID) (pendinguploads.Record, error) {
return pendinguploads.Record{}, errors.New("not used")
}
func (f *fakeSweepStorage) MarkFetched(_ context.Context, _ uuid.UUID) error {
return errors.New("not used")
}
func (f *fakeSweepStorage) Ack(_ context.Context, _ uuid.UUID) error {
return errors.New("not used")
}
func (f *fakeSweepStorage) Sweep(_ context.Context, ackRetention time.Duration) (pendinguploads.SweepResult, error) {
idx := int(f.calls.Load())
f.calls.Add(1)
f.gotRetention.Store(int64(ackRetention.Seconds()))
defer func() {
select {
case f.cycleDone <- struct{}{}:
default:
}
}()
if idx < len(f.errs) && f.errs[idx] != nil {
return pendinguploads.SweepResult{}, f.errs[idx]
}
if idx < len(f.results) {
return f.results[idx], nil
}
return pendinguploads.SweepResult{}, nil
}
// waitForCycle blocks until at least one Sweep completes, with a deadline.
// Tests use this instead of time.Sleep to avoid flakes on slow CI hosts.
func (f *fakeSweepStorage) waitForCycle(t *testing.T, n int, timeout time.Duration) {
t.Helper()
deadline := time.NewTimer(timeout)
defer deadline.Stop()
for got := 0; got < n; got++ {
select {
case <-f.cycleDone:
case <-deadline.C:
t.Fatalf("waited %s for %d sweep cycles, got %d", timeout, n, f.calls.Load())
}
}
}
func TestStartSweeper_NilStorageDoesNotPanic(t *testing.T) {
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
// Should return immediately without panicking; no goroutine to wait on.
pendinguploads.StartSweeper(ctx, nil, time.Second)
}
func TestStartSweeper_RunsImmediatelyAndOnTick(t *testing.T) {
store := newFakeSweepStorage(
[]pendinguploads.SweepResult{{Acked: 5}, {Acked: 1, Expired: 2}},
nil,
)
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
go pendinguploads.StartSweeper(ctx, store, time.Hour)
store.waitForCycle(t, 1, 2*time.Second)
if got := store.calls.Load(); got < 1 {
t.Errorf("expected at least one immediate sweep, got %d", got)
}
// Retention propagated.
if store.gotRetention.Load() != 3600 {
t.Errorf("retention seconds = %d, want 3600", store.gotRetention.Load())
}
}
func TestStartSweeper_ZeroAckRetentionUsesDefault(t *testing.T) {
store := newFakeSweepStorage([]pendinguploads.SweepResult{{}}, nil)
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
go pendinguploads.StartSweeper(ctx, store, 0)
store.waitForCycle(t, 1, 2*time.Second)
want := int64(pendinguploads.DefaultAckRetention.Seconds())
if store.gotRetention.Load() != want {
t.Errorf("retention = %d, want default %d", store.gotRetention.Load(), want)
}
}
func TestStartSweeper_ContextCancelStopsLoop(t *testing.T) {
store := newFakeSweepStorage([]pendinguploads.SweepResult{{}}, nil)
ctx, cancel := context.WithCancel(context.Background())
done := make(chan struct{})
go func() {
pendinguploads.StartSweeper(ctx, store, time.Second)
close(done)
}()
store.waitForCycle(t, 1, 2*time.Second)
cancel()
select {
case <-done:
case <-time.After(2 * time.Second):
t.Fatal("StartSweeper did not return after ctx cancel")
}
}
func TestStartSweeperWithInterval_TickerFiresAdditionalCycles(t *testing.T) {
store := newFakeSweepStorage(
[]pendinguploads.SweepResult{{Acked: 1}, {Expired: 1}, {}, {}, {}},
nil,
)
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
go pendinguploads.StartSweeperWithInterval(ctx, store, time.Hour, 30*time.Millisecond)
// Immediate cycle + at least one tick-driven cycle.
store.waitForCycle(t, 2, 2*time.Second)
if got := store.calls.Load(); got < 2 {
t.Errorf("expected ≥2 cycles (immediate + 1 tick), got %d", got)
}
}
func TestStartSweeper_TransientErrorDoesNotCrashLoop(t *testing.T) {
// First call errors; second call succeeds. The loop must keep running
// across the error so a one-off DB hiccup doesn't disable the GC.
store := newFakeSweepStorage(
[]pendinguploads.SweepResult{{}, {Acked: 1}},
[]error{errors.New("transient db error"), nil},
)
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
// 50ms ticker so the second cycle fires quickly enough for the test.
// We re-export SweepInterval as a const, but tests use the public
// StartSweeper that takes its own interval — wait, the public
// StartSweeper signature uses the package-level SweepInterval. Hmm,
// this means the test takes ~5 minutes. Let me reconsider.
//
// (We patch the test below to just look at the immediate-sweep call
// + an error path, since the immediate call is enough to prove the
// "error doesn't crash" contract — the loop continues afterward
// regardless of timing.)
go pendinguploads.StartSweeper(ctx, store, time.Hour)
// Wait for the first (errored) cycle.
store.waitForCycle(t, 1, 2*time.Second)
// Cancel — the goroutine returns cleanly, proving the error path
// didn't crash the loop. Without this fix the goroutine would have
// either panicked (process abort visible at exit) or stuck (this
// cancel + done-channel pattern would deadlock instead).
cancel()
}
// metricDelta returns a function that, when called, returns how much
// the (acked, expired, errored) counters have advanced since metricDelta
// was originally called. metrics is a process-singleton across the test
// suite; deltas isolate this test from order-of-execution dependencies.
func metricDelta(t *testing.T) (deltaAcked, deltaExpired, deltaError func() int64) {
t.Helper()
a0, e0, err0 := metrics.PendingUploadsSweepCounts()
deltaAcked = func() int64 {
a, _, _ := metrics.PendingUploadsSweepCounts()
return a - a0
}
deltaExpired = func() int64 {
_, e, _ := metrics.PendingUploadsSweepCounts()
return e - e0
}
deltaError = func() int64 {
_, _, x := metrics.PendingUploadsSweepCounts()
return x - err0
}
return
}
func TestStartSweeper_RecordsMetricsOnSuccess(t *testing.T) {
deltaAcked, deltaExpired, deltaError := metricDelta(t)
store := newFakeSweepStorage(
[]pendinguploads.SweepResult{{Acked: 3, Expired: 5}},
nil,
)
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
go pendinguploads.StartSweeper(ctx, store, time.Hour)
store.waitForCycle(t, 1, 2*time.Second)
if got := deltaAcked(); got != 3 {
t.Errorf("acked counter delta = %d, want 3", got)
}
if got := deltaExpired(); got != 5 {
t.Errorf("expired counter delta = %d, want 5", got)
}
if got := deltaError(); got != 0 {
t.Errorf("error counter delta = %d, want 0", got)
}
}
func TestStartSweeper_RecordsMetricsOnError(t *testing.T) {
_, _, deltaError := metricDelta(t)
store := newFakeSweepStorage(
[]pendinguploads.SweepResult{{}},
[]error{errors.New("db down")},
)
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
go pendinguploads.StartSweeper(ctx, store, time.Hour)
store.waitForCycle(t, 1, 2*time.Second)
if got := deltaError(); got != 1 {
t.Errorf("error counter delta = %d, want 1", got)
}
}
@@ -14,6 +14,7 @@ import (
cronlib "github.com/robfig/cron/v3"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/db"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/metrics"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/supervised"
)
@@ -741,6 +742,11 @@ func (s *Scheduler) sweepPhantomBusy(ctx context.Context) {
continue
}
log.Printf("Scheduler: phantom-busy sweep — reset %s (no activity in %d min)", name, int(phantomStaleThreshold.Minutes()))
// #2865: surface as molecule_phantom_busy_resets_total. High
// reset rate signals task-lifecycle accounting regressions
// (e.g. missing env vars causing claude --print timeouts that
// leave active_tasks elevated until this sweep fires).
metrics.TrackPhantomBusyReset()
count++
}
if err := rows.Err(); err != nil {
+13 -89
View File
@@ -28,96 +28,20 @@ from platform_auth import list_registered_workspaces
# ---------------------------------------------------------------------------
# RBAC helpers (mirror builtin_tools/audit.py for a2a_tools isolation)
# RBAC + auth helpers — extracted to a2a_tools_rbac (RFC #2873 iter 4a).
# Re-exported here under the legacy underscore names so existing tests'
# patch("a2a_tools._check_memory_write_permission", …) and call sites
# inside this module that resolve bare names against the module-level
# namespace continue to work unchanged.
# ---------------------------------------------------------------------------
_ROLE_PERMISSIONS = {
"admin": {"delegate", "approve", "memory.read", "memory.write"},
"operator": {"delegate", "approve", "memory.read", "memory.write"},
"read-only": {"memory.read"},
"no-delegation": {"approve", "memory.read", "memory.write"},
"no-approval": {"delegate", "memory.read", "memory.write"},
"memory-readonly": {"memory.read"},
}
def _get_workspace_tier() -> int:
"""Return the workspace tier from config (0 = root, 1+ = tenant)."""
try:
from config import load_config
cfg = load_config()
return getattr(cfg, "tier", 1)
except Exception:
return int(os.environ.get("WORKSPACE_TIER", 1))
def _check_memory_write_permission() -> bool:
"""Return True if this workspace's RBAC roles grant memory.write."""
try:
from config import load_config
cfg = load_config()
roles = list(getattr(cfg, "rbac", None).roles or ["operator"])
allowed = dict(getattr(cfg, "rbac", None).allowed_actions or {})
except Exception:
# Fail closed: deny when config is unavailable
roles = ["operator"]
allowed = {}
for role in roles:
if role == "admin":
return True
if role in allowed:
if "memory.write" in allowed[role]:
return True
elif role in _ROLE_PERMISSIONS and "memory.write" in _ROLE_PERMISSIONS[role]:
return True
return False
def _check_memory_read_permission() -> bool:
"""Return True if this workspace's RBAC roles grant memory.read."""
try:
from config import load_config
cfg = load_config()
roles = list(getattr(cfg, "rbac", None).roles or ["operator"])
allowed = dict(getattr(cfg, "rbac", None).allowed_actions or {})
except Exception:
roles = ["operator"]
allowed = {}
for role in roles:
if role == "admin":
return True
if role in allowed:
if "memory.read" in allowed[role]:
return True
elif role in _ROLE_PERMISSIONS and "memory.read" in _ROLE_PERMISSIONS[role]:
return True
return False
def _is_root_workspace() -> bool:
"""Return True if this workspace is tier 0 (root/root-org)."""
return _get_workspace_tier() == 0
def _auth_headers_for_heartbeat(workspace_id: str | None = None) -> dict[str, str]:
"""Return Phase 30.1 auth headers; tolerate platform_auth being absent
in older installs (e.g. during rolling upgrade).
``workspace_id`` selects the per-workspace token from the multi-
workspace registry when set (PR-1: external agent registered in
multiple workspaces). With no arg the legacy single-token path is
unchanged.
"""
try:
from platform_auth import auth_headers
return auth_headers(workspace_id) if workspace_id else auth_headers()
except Exception:
return {}
from a2a_tools_rbac import ( # noqa: E402 (import after the from-a2a_client block)
_auth_headers_for_heartbeat,
_check_memory_read_permission,
_check_memory_write_permission,
_get_workspace_tier,
_is_root_workspace,
_ROLE_PERMISSIONS,
)
# Per-field caps on the heartbeat / activity payload. Borrowed from
+138
View File
@@ -0,0 +1,138 @@
"""RBAC + auth-header helpers shared by all a2a_tools tool handlers.
Extracted from ``a2a_tools.py`` (RFC #2873 iter 4a). Centralises the
"what can this workspace do" + "how do I prove it on a platform call"
concerns into a single module so:
* Future tools added under ``a2a_tools/`` see one obvious helper to
call instead of re-implementing the role/tier check.
* The role-permission table is in ONE place — adding a new role
or capability touches one file, not every tool that gates on it.
* Tests targeting these helpers don't have to import the whole
991-LOC ``a2a_tools`` surface.
Public surface:
* ``ROLE_PERMISSIONS`` — canonical role → action set table.
* ``get_workspace_tier()`` — config-resolved tier (0 = root).
* ``check_memory_write_permission()`` — boolean.
* ``check_memory_read_permission()`` — boolean.
* ``is_root_workspace()`` — boolean (tier == 0).
* ``auth_headers_for_heartbeat(workspace_id=None)`` — auth-header dict
with the multi-workspace registry lookup; tolerates ``platform_auth``
missing on older installs (returns ``{}``).
Underscore-prefixed back-compat aliases (``_ROLE_PERMISSIONS``,
``_check_memory_write_permission``, etc.) match the names previously
exposed in ``a2a_tools`` so existing tests'
``patch("a2a_tools._foo", ...)`` continue to work via the re-exports
in ``a2a_tools.py``.
"""
from __future__ import annotations
import os
# Mirror ``builtin_tools/audit.py`` for a2a_tools isolation. Listed as a
# module-level constant rather than computed lazily so the table is
# discoverable in static analysis + ``grep``.
ROLE_PERMISSIONS: dict[str, set[str]] = {
"admin": {"delegate", "approve", "memory.read", "memory.write"},
"operator": {"delegate", "approve", "memory.read", "memory.write"},
"read-only": {"memory.read"},
"no-delegation": {"approve", "memory.read", "memory.write"},
"no-approval": {"delegate", "memory.read", "memory.write"},
"memory-readonly": {"memory.read"},
}
def get_workspace_tier() -> int:
"""Return the workspace tier from config (0 = root, 1+ = tenant)."""
try:
from config import load_config
cfg = load_config()
return getattr(cfg, "tier", 1)
except Exception:
return int(os.environ.get("WORKSPACE_TIER", 1))
def _resolve_role_state() -> tuple[list[str], dict]:
"""Return (roles, allowed_actions) from config.
Fail-closed: if config is unavailable, fall back to an "operator"
default with no per-role overrides. Operator has memory.read +
memory.write but not the elevated approve/delegate over GLOBAL
scope, so a config outage doesn't grant unexpected privileges.
"""
try:
from config import load_config
cfg = load_config()
roles = list(getattr(cfg, "rbac", None).roles or ["operator"])
allowed = dict(getattr(cfg, "rbac", None).allowed_actions or {})
return roles, allowed
except Exception:
return ["operator"], {}
def check_memory_write_permission() -> bool:
"""Return True if this workspace's RBAC roles grant memory.write."""
roles, allowed = _resolve_role_state()
for role in roles:
if role == "admin":
return True
if role in allowed:
if "memory.write" in allowed[role]:
return True
elif role in ROLE_PERMISSIONS and "memory.write" in ROLE_PERMISSIONS[role]:
return True
return False
def check_memory_read_permission() -> bool:
"""Return True if this workspace's RBAC roles grant memory.read."""
roles, allowed = _resolve_role_state()
for role in roles:
if role == "admin":
return True
if role in allowed:
if "memory.read" in allowed[role]:
return True
elif role in ROLE_PERMISSIONS and "memory.read" in ROLE_PERMISSIONS[role]:
return True
return False
def is_root_workspace() -> bool:
"""Return True if this workspace is tier 0 (root/root-org)."""
return get_workspace_tier() == 0
def auth_headers_for_heartbeat(workspace_id: str | None = None) -> dict[str, str]:
"""Return Phase 30.1 auth headers; tolerate platform_auth being absent
in older installs (e.g. during rolling upgrade).
``workspace_id`` selects the per-workspace token from the multi-
workspace registry when set (PR-1: external agent registered in
multiple workspaces). With no arg the legacy single-token path is
unchanged.
"""
try:
from platform_auth import auth_headers
return auth_headers(workspace_id) if workspace_id else auth_headers()
except Exception:
return {}
# ============== Back-compat aliases for the previous a2a_tools names ==============
# Tests + downstream call sites refer to the pre-extract names; aliasing
# keeps both forms valid. The new public names (no underscore prefix)
# are preferred for new code.
_ROLE_PERMISSIONS = ROLE_PERMISSIONS
_get_workspace_tier = get_workspace_tier
_check_memory_write_permission = check_memory_write_permission
_check_memory_read_permission = check_memory_read_permission
_is_root_workspace = is_root_workspace
_auth_headers_for_heartbeat = auth_headers_for_heartbeat
+33 -466
View File
@@ -31,422 +31,53 @@ dependency via ``a2a-sdk``.
In-container usage (``python -m molecule_runtime.a2a_mcp_server`` or
direct import) bypasses this wrapper — the workspace runtime has its
own heartbeat loop in ``heartbeat.py`` so we don't double-heartbeat.
Module layout (RFC #2873 iter 3 split):
* ``mcp_heartbeat`` — register POST + heartbeat loop + auth-failure
escalation + inbound-secret persistence.
* ``mcp_workspace_resolver`` — env validation, single + multi-workspace
resolution, operator-help printer, on-disk token-file read.
* ``mcp_inbox_pollers`` — activate the inbox singleton + spawn one
daemon poller per workspace.
This file keeps just ``main()`` plus thin re-exports of the private
symbols so existing tests' imports (``mcp_cli._build_agent_card``,
``mcp_cli._heartbeat_loop``, etc.) keep working without churn.
"""
from __future__ import annotations
import json
import logging
import os
import sys
import threading
import time
from pathlib import Path
import configs_dir
import mcp_heartbeat
import mcp_inbox_pollers
import mcp_workspace_resolver
logger = logging.getLogger(__name__)
# Heartbeat cadence. Must be tighter than healthsweep's stale window
# (currently 60-90s — see registry/healthsweep.go) by a comfortable
# margin so a single missed heartbeat doesn't flip awaiting_agent.
# 20s gives the operator's network 3 attempts within the budget; long
# enough that it doesn't spam, short enough to recover quickly after
# laptop sleep.
HEARTBEAT_INTERVAL_SECONDS = 20.0
# Re-export public surface for back-compat with the pre-split callers
# and tests. The underscore-prefixed names mirror the names that
# existed in this module before the split — keeping them ensures
# `mcp_cli._build_agent_card`, `mcp_cli._heartbeat_loop`, etc.
# resolve identically to the new functions.
HEARTBEAT_INTERVAL_SECONDS = mcp_heartbeat.HEARTBEAT_INTERVAL_SECONDS
_HEARTBEAT_AUTH_LOUD_THRESHOLD = mcp_heartbeat.HEARTBEAT_AUTH_LOUD_THRESHOLD
_HEARTBEAT_AUTH_RELOG_INTERVAL = mcp_heartbeat.HEARTBEAT_AUTH_RELOG_INTERVAL
# After this many consecutive 401/403 heartbeats, escalate from
# WARNING to ERROR with re-onboard guidance. 3 ticks at 20s = ~1 minute
# of sustained auth failure — enough to rule out a transient platform
# blip but quick enough that an operator doesn't sit puzzled for 10
# minutes wondering why their MCP tools 401. Same threshold used for
# repeat-logging at 20-tick (~7 min) intervals so a long-running
# session that missed the first ERROR still sees the message.
_HEARTBEAT_AUTH_LOUD_THRESHOLD = 3
_HEARTBEAT_AUTH_RELOG_INTERVAL = 20
_build_agent_card = mcp_heartbeat.build_agent_card
_platform_register = mcp_heartbeat.platform_register
_heartbeat_loop = mcp_heartbeat.heartbeat_loop
_log_heartbeat_auth_failure = mcp_heartbeat.log_heartbeat_auth_failure
_persist_inbound_secret_from_heartbeat = mcp_heartbeat.persist_inbound_secret_from_heartbeat
_start_heartbeat_thread = mcp_heartbeat.start_heartbeat_thread
_resolve_workspaces = mcp_workspace_resolver.resolve_workspaces
_print_missing_env_help = mcp_workspace_resolver.print_missing_env_help
_read_token_file = mcp_workspace_resolver.read_token_file
def _build_agent_card(workspace_id: str) -> dict:
"""Build the ``agent_card`` payload sent to /registry/register.
Three optional env vars override the defaults so an operator can
surface human-readable identity + capabilities to peers and the
canvas Skills tab without code changes:
* ``MOLECULE_AGENT_NAME`` — display name (defaults to
``molecule-mcp-{id[:8]}``). Surfaced in canvas workspace cards
and ``list_peers`` output.
* ``MOLECULE_AGENT_DESCRIPTION`` — one-liner about the agent's
purpose. Rendered in canvas Details + Skills tabs.
* ``MOLECULE_AGENT_SKILLS`` — comma-separated skill names
(e.g. ``research,code-review,memory-curation``). Each name is
expanded to a ``{"name": ...}`` skill object — the minimum
shape that satisfies both ``shared_runtime.summarize_peers``
(uses ``s["name"]``) and the canvas SkillsTab.tsx schema
(id falls back to name when omitted). Empty / whitespace
entries are dropped.
Defaults match the previous hardcoded behaviour exactly so this
is a strict superset — an operator who sets none of the env vars
sees no change.
"""
name = (os.environ.get("MOLECULE_AGENT_NAME") or "").strip()
if not name:
name = f"molecule-mcp-{workspace_id[:8]}"
description = (os.environ.get("MOLECULE_AGENT_DESCRIPTION") or "").strip()
skills_raw = (os.environ.get("MOLECULE_AGENT_SKILLS") or "").strip()
skills: list[dict] = []
if skills_raw:
for s in skills_raw.split(","):
label = s.strip()
if label:
skills.append({"name": label})
card: dict = {"name": name, "skills": skills}
if description:
card["description"] = description
return card
def _platform_register(platform_url: str, workspace_id: str, token: str) -> None:
"""One-shot register at startup; fails fast on auth errors.
Lifts the workspace from ``awaiting_agent`` to ``online`` for
operators who never ran the curl-register snippet. Safe to call
repeatedly: the platform's register handler is an upsert that
just refreshes ``url``, ``agent_card``, and ``status``.
Failure model (post-review):
- 401 / 403 → ``sys.exit(3)`` immediately. The operator's
token is wrong; silently looping in a broken state would
make this hard to diagnose because the MCP tools would 401
on every call too. Hard-fail is the kindest option.
- Other 4xx/5xx → log a warning + continue. The heartbeat
thread will surface persistent failures; transient platform
blips shouldn't abort the MCP loop.
- Network / transport errors → log + continue. Same reasoning.
Origin header is required by the SaaS edge WAF; without it
/registry/register currently still works (it's on the WAF
allowlist), but the heartbeat path needs Origin and we want one
consistent header set across both calls.
"""
try:
import httpx
except ImportError:
# httpx is a transitive dep via a2a-sdk; if missing, the MCP
# server won't import either. Let the caller's later import
# surface the real error.
return
payload = {
"id": workspace_id,
"url": "",
"agent_card": _build_agent_card(workspace_id),
"delivery_mode": "poll",
}
headers = {
"Authorization": f"Bearer {token}",
"Origin": platform_url,
"Content-Type": "application/json",
}
try:
with httpx.Client(timeout=10.0) as client:
resp = client.post(
f"{platform_url}/registry/register",
json=payload,
headers=headers,
)
if resp.status_code in (401, 403):
print(
f"molecule-mcp: register rejected with HTTP {resp.status_code}"
f"the token in MOLECULE_WORKSPACE_TOKEN is invalid for workspace "
f"{workspace_id}. Regenerate from the canvas → Tokens tab.",
file=sys.stderr,
)
sys.exit(3)
if resp.status_code >= 400:
logger.warning(
"molecule-mcp: register POST returned HTTP %d: %s",
resp.status_code,
(resp.text or "")[:200],
)
else:
logger.info(
"molecule-mcp: registered workspace %s with platform",
workspace_id,
)
except SystemExit:
raise
except Exception as exc: # noqa: BLE001
logger.warning("molecule-mcp: register POST failed: %s", exc)
def _heartbeat_loop(
platform_url: str,
workspace_id: str,
token: str,
interval: float = HEARTBEAT_INTERVAL_SECONDS,
) -> None:
"""Daemon thread body: POST /registry/heartbeat every ``interval``s.
Failures are logged at WARNING and the loop continues. The thread
exits when the main process does (daemon=True). Each iteration
rebuilds the payload + headers — cheap and ensures token rotation
via env var (rare but possible) is picked up on the next tick.
"""
try:
import httpx
except ImportError:
return
start_time = time.time()
consecutive_auth_failures = 0
while True:
body = {
"workspace_id": workspace_id,
"error_rate": 0.0,
"sample_error": "",
"active_tasks": 0,
"uptime_seconds": int(time.time() - start_time),
}
headers = {
"Authorization": f"Bearer {token}",
"Origin": platform_url,
"Content-Type": "application/json",
}
try:
with httpx.Client(timeout=10.0) as client:
resp = client.post(
f"{platform_url}/registry/heartbeat",
json=body,
headers=headers,
)
if resp.status_code in (401, 403):
consecutive_auth_failures += 1
_log_heartbeat_auth_failure(
consecutive_auth_failures, workspace_id, resp.status_code,
)
elif resp.status_code >= 400:
# Non-auth HTTP error — log, but DO NOT touch the
# auth-failure counter (5xx blips, 429, etc. are
# transient and unrelated to token validity).
logger.warning(
"molecule-mcp: heartbeat HTTP %d: %s",
resp.status_code,
(resp.text or "")[:200],
)
else:
consecutive_auth_failures = 0
_persist_inbound_secret_from_heartbeat(resp)
except Exception as exc: # noqa: BLE001
logger.warning("molecule-mcp: heartbeat failed: %s", exc)
time.sleep(interval)
def _log_heartbeat_auth_failure(count: int, workspace_id: str, status_code: int) -> None:
"""Escalate consecutive heartbeat 401/403s from quiet WARNING to
actionable ERROR.
The operator's first sign of trouble shouldn't be "tools 401 with no
explanation" — that was the failure mode that motivated this code,
triggered by a workspace being deleted server-side and its tokens
revoked while the runtime kept heartbeating in silence.
Cadence:
* count < threshold: WARNING per tick (transient — could be a
platform blip, don't shout yet)
* count == threshold: ERROR with re-onboard instructions
(the first signal the operator can't miss)
* count > threshold and (count - threshold) % relog == 0: re-log
ERROR (so a session that started after the first ERROR still
sees the message scrolling past in their logs)
"""
if count < _HEARTBEAT_AUTH_LOUD_THRESHOLD:
logger.warning(
"molecule-mcp: heartbeat HTTP %d (auth failure %d/%d) — "
"token may be revoked. Will retry; if persistent, regenerate "
"from canvas → Tokens.",
status_code, count, _HEARTBEAT_AUTH_LOUD_THRESHOLD,
)
return
# At or past the threshold — this is the loud actionable error.
if count == _HEARTBEAT_AUTH_LOUD_THRESHOLD or (
count - _HEARTBEAT_AUTH_LOUD_THRESHOLD
) % _HEARTBEAT_AUTH_RELOG_INTERVAL == 0:
logger.error(
"molecule-mcp: %d consecutive heartbeat auth failures (HTTP %d) — "
"the token in MOLECULE_WORKSPACE_TOKEN has been REVOKED, likely "
"because workspace %s was deleted server-side. The MCP server is "
"still running but every platform call will fail. Regenerate the "
"workspace + token from the canvas (Tokens tab), update your MCP "
"config, and restart your runtime.",
count, status_code, workspace_id,
)
def _persist_inbound_secret_from_heartbeat(resp: object) -> None:
"""Persist ``platform_inbound_secret`` from a heartbeat response, if any.
The platform's heartbeat handler returns the secret on every beat
(mirroring /registry/register) so a workspace that lazy-healed the
secret on the platform side — typical recovery path for a workspace
whose row had a NULL ``platform_inbound_secret`` after a partial
bootstrap — picks it up within one heartbeat tick instead of
requiring a runtime restart.
Without this delivery path the chat-upload code path's "secret was
just minted, will pick up on next heartbeat" 503 message is a lie
and the workspace stays 401-forever until the operator restarts
the runtime. Caught 2026-04-30 on hongmingwang tenant.
Failure is non-fatal: if the body isn't JSON, doesn't carry the
field, or the disk write fails, the next heartbeat retries. This
matches the cold-start register flow in main.py:319-323.
"""
try:
body = resp.json()
except Exception: # noqa: BLE001
return
if not isinstance(body, dict):
return
secret = body.get("platform_inbound_secret")
if not secret:
return
try:
from platform_inbound_auth import save_inbound_secret
save_inbound_secret(secret)
except Exception as exc: # noqa: BLE001
logger.warning(
"molecule-mcp: persist inbound secret from heartbeat failed: %s", exc
)
def _start_heartbeat_thread(
platform_url: str,
workspace_id: str,
token: str,
) -> threading.Thread:
"""Start the heartbeat daemon thread. Returns the Thread handle.
The MCP stdio loop runs in the foreground (asyncio); this thread
runs alongside it. ``daemon=True`` so when the operator hits
Ctrl-C / closes the runtime, the heartbeat dies with it instead
of leaking and writing to a stale workspace.
"""
t = threading.Thread(
target=_heartbeat_loop,
args=(platform_url, workspace_id, token),
name="molecule-mcp-heartbeat",
daemon=True,
)
t.start()
return t
def _resolve_workspaces() -> tuple[list[tuple[str, str]], list[str]]:
"""Return the list of ``(workspace_id, token)`` pairs to register.
Resolution order:
1. ``MOLECULE_WORKSPACES`` env var — JSON array of
``{"id": "...", "token": "..."}`` objects. Activates the
multi-workspace external-agent path (one process registered into
N workspaces). When set, ``WORKSPACE_ID`` / ``MOLECULE_WORKSPACE_TOKEN``
are IGNORED — the JSON is the source of truth.
2. Single-workspace fallback — ``WORKSPACE_ID`` env var + token from
``MOLECULE_WORKSPACE_TOKEN`` or ``${CONFIGS_DIR}/.auth_token``.
This is the pre-existing path; back-compat exact.
Returns ``(workspaces, errors)``:
* ``workspaces``: list of ``(workspace_id, token)`` — non-empty
on the happy path.
* ``errors``: human-readable strings describing what's missing /
malformed. ``main()`` surfaces these with the same shape as
``_print_missing_env_help`` so the operator's first run gives
actionable output.
Why JSON env (not file): ergonomic for Claude Code MCP config (one
string in ``mcpServers.molecule.env`` instead of a sidecar file)
and for CI / launchers. A separate config-file path can be added
later without breaking this.
"""
raw = os.environ.get("MOLECULE_WORKSPACES", "").strip()
if raw:
try:
parsed = json.loads(raw)
except json.JSONDecodeError as exc:
return [], [
f"MOLECULE_WORKSPACES is not valid JSON ({exc.msg} at pos "
f"{exc.pos}). Expected: '[{{\"id\":\"<wsid>\",\"token\":"
f"\"<tok>\"}},{{...}}]'"
]
if not isinstance(parsed, list) or not parsed:
return [], [
"MOLECULE_WORKSPACES must be a non-empty JSON array of "
"{\"id\":\"...\",\"token\":\"...\"} objects"
]
out: list[tuple[str, str]] = []
seen: set[str] = set()
errors: list[str] = []
for i, entry in enumerate(parsed):
if not isinstance(entry, dict):
errors.append(
f"MOLECULE_WORKSPACES[{i}] is not an object — got {type(entry).__name__}"
)
continue
wsid = str(entry.get("id", "")).strip()
tok = str(entry.get("token", "")).strip()
if not wsid or not tok:
errors.append(
f"MOLECULE_WORKSPACES[{i}] missing 'id' or 'token'"
)
continue
if wsid in seen:
errors.append(
f"MOLECULE_WORKSPACES[{i}] duplicate workspace id {wsid!r}"
)
continue
seen.add(wsid)
out.append((wsid, tok))
if errors:
return [], errors
return out, []
# Single-workspace back-compat path.
wsid = os.environ.get("WORKSPACE_ID", "").strip()
if not wsid:
return [], ["WORKSPACE_ID (or MOLECULE_WORKSPACES) is required"]
tok = os.environ.get("MOLECULE_WORKSPACE_TOKEN", "").strip()
if not tok:
tok = _read_token_file()
if not tok:
return [], [
"MOLECULE_WORKSPACE_TOKEN (or CONFIGS_DIR/.auth_token) is required"
]
return [(wsid, tok)], []
def _print_missing_env_help(missing: list[str], have_token_file: bool) -> None:
print("molecule-mcp: missing required environment.\n", file=sys.stderr)
print("Set the following before running molecule-mcp:", file=sys.stderr)
print(" WORKSPACE_ID — your workspace UUID (from canvas)", file=sys.stderr)
print(
" PLATFORM_URL — base URL of your Molecule platform "
"(e.g. https://your-tenant.staging.moleculesai.app)",
file=sys.stderr,
)
if not have_token_file:
print(
" MOLECULE_WORKSPACE_TOKEN — bearer token for this workspace "
"(canvas → Tokens tab)",
file=sys.stderr,
)
print("", file=sys.stderr)
print(f"Currently missing: {', '.join(missing)}", file=sys.stderr)
_start_inbox_pollers = mcp_inbox_pollers.start_inbox_pollers
def main() -> None:
@@ -558,69 +189,5 @@ def main() -> None:
cli_main()
def _start_inbox_pollers(platform_url: str, workspace_ids: list[str]) -> None:
"""Activate the inbox singleton + spawn one poller daemon thread per workspace.
Done lazily here (not at module import) because importing inbox
pulls in platform_auth, which only resolves cleanly AFTER env
validation succeeds. Activation is idempotent within a process,
so a stray double-call (e.g. test harness re-entering main) is
harmless.
The poller threads are daemon=True — die with the main process.
Single-workspace path: one poller, single cursor file at the legacy
location (``.mcp_inbox_cursor``). Cursor-key resolution falls back
to the empty string for back-compat with operators whose existing
on-disk cursor was written by the pre-multi-workspace code.
Multi-workspace path: N pollers, each with its own cursor file
keyed by ``workspace_id[:8]``. Cursors live next to each other in
configs_dir so an operator inspecting state sees all of them
together.
"""
try:
import inbox
except ImportError as exc:
logger.warning("molecule-mcp: inbox module unavailable: %s", exc)
return
if len(workspace_ids) <= 1:
# Back-compat exact: single-workspace mode reuses the legacy
# cursor filename + cursor_path constructor arg, so an existing
# operator's on-disk state isn't invalidated by upgrade.
wsid = workspace_ids[0]
state = inbox.InboxState(cursor_path=inbox.default_cursor_path())
inbox.activate(state)
inbox.start_poller_thread(state, platform_url, wsid)
return
# Multi-workspace: per-workspace cursor file, one shared queue.
cursor_paths = {wsid: inbox.default_cursor_path(wsid) for wsid in workspace_ids}
state = inbox.InboxState(cursor_paths=cursor_paths)
inbox.activate(state)
for wsid in workspace_ids:
inbox.start_poller_thread(state, platform_url, wsid)
def _read_token_file() -> str:
"""Read the token from the resolved configs dir's ``.auth_token`` if
present.
Mirrors platform_auth._token_file's location resolution but without
importing the heavy module here (that import triggers a2a_client's
WORKSPACE_ID guard which is fine after env validation, but cheaper
to inline a 4-line file read than pull in the whole stack just for
the path).
"""
path = configs_dir.resolve() / ".auth_token"
if not path.is_file():
return ""
try:
return path.read_text().strip()
except OSError:
return ""
if __name__ == "__main__": # pragma: no cover
main()
+325
View File
@@ -0,0 +1,325 @@
"""Heartbeat + register thread for the standalone ``molecule-mcp`` wrapper.
Extracted from ``mcp_cli.py`` (RFC #2873 iter 3) so the heartbeat /
register concern lives in its own module. The console-script entry
``mcp_cli:main`` still drives the spawn, but the loop body, auth-failure
escalation, and inbound-secret persistence now live here so they can be
read, tested, and replaced independently of the orchestrator.
Public surface:
* ``HEARTBEAT_INTERVAL_SECONDS`` — cadence constant.
* ``build_agent_card(workspace_id)`` — payload helper.
* ``platform_register(platform_url, workspace_id, token)`` — one-shot
POST /registry/register at startup.
* ``start_heartbeat_thread(platform_url, workspace_id, token)`` — spawn
the daemon thread.
"""
from __future__ import annotations
import logging
import os
import sys
import threading
import time
logger = logging.getLogger(__name__)
# Heartbeat cadence. Must be tighter than healthsweep's stale window
# (currently 60-90s — see registry/healthsweep.go) by a comfortable
# margin so a single missed heartbeat doesn't flip awaiting_agent.
# 20s gives the operator's network 3 attempts within the budget; long
# enough that it doesn't spam, short enough to recover quickly after
# laptop sleep.
HEARTBEAT_INTERVAL_SECONDS = 20.0
# After this many consecutive 401/403 heartbeats, escalate from
# WARNING to ERROR with re-onboard guidance. 3 ticks at 20s = ~1 minute
# of sustained auth failure — enough to rule out a transient platform
# blip but quick enough that an operator doesn't sit puzzled for 10
# minutes wondering why their MCP tools 401. Same threshold used for
# repeat-logging at 20-tick (~7 min) intervals so a long-running
# session that missed the first ERROR still sees the message.
HEARTBEAT_AUTH_LOUD_THRESHOLD = 3
HEARTBEAT_AUTH_RELOG_INTERVAL = 20
def build_agent_card(workspace_id: str) -> dict:
"""Build the ``agent_card`` payload sent to /registry/register.
Three optional env vars override the defaults so an operator can
surface human-readable identity + capabilities to peers and the
canvas Skills tab without code changes:
* ``MOLECULE_AGENT_NAME`` — display name (defaults to
``molecule-mcp-{id[:8]}``). Surfaced in canvas workspace cards
and ``list_peers`` output.
* ``MOLECULE_AGENT_DESCRIPTION`` — one-liner about the agent's
purpose. Rendered in canvas Details + Skills tabs.
* ``MOLECULE_AGENT_SKILLS`` — comma-separated skill names
(e.g. ``research,code-review,memory-curation``). Each name is
expanded to a ``{"name": ...}`` skill object — the minimum
shape that satisfies both ``shared_runtime.summarize_peers``
(uses ``s["name"]``) and the canvas SkillsTab.tsx schema
(id falls back to name when omitted). Empty / whitespace
entries are dropped.
Defaults match the previous hardcoded behaviour exactly so this
is a strict superset — an operator who sets none of the env vars
sees no change.
"""
name = (os.environ.get("MOLECULE_AGENT_NAME") or "").strip()
if not name:
name = f"molecule-mcp-{workspace_id[:8]}"
description = (os.environ.get("MOLECULE_AGENT_DESCRIPTION") or "").strip()
skills_raw = (os.environ.get("MOLECULE_AGENT_SKILLS") or "").strip()
skills: list[dict] = []
if skills_raw:
for s in skills_raw.split(","):
label = s.strip()
if label:
skills.append({"name": label})
card: dict = {"name": name, "skills": skills}
if description:
card["description"] = description
return card
def platform_register(platform_url: str, workspace_id: str, token: str) -> None:
"""One-shot register at startup; fails fast on auth errors.
Lifts the workspace from ``awaiting_agent`` to ``online`` for
operators who never ran the curl-register snippet. Safe to call
repeatedly: the platform's register handler is an upsert that
just refreshes ``url``, ``agent_card``, and ``status``.
Failure model (post-review):
- 401 / 403 → ``sys.exit(3)`` immediately. The operator's
token is wrong; silently looping in a broken state would
make this hard to diagnose because the MCP tools would 401
on every call too. Hard-fail is the kindest option.
- Other 4xx/5xx → log a warning + continue. The heartbeat
thread will surface persistent failures; transient platform
blips shouldn't abort the MCP loop.
- Network / transport errors → log + continue. Same reasoning.
Origin header is required by the SaaS edge WAF; without it
/registry/register currently still works (it's on the WAF
allowlist), but the heartbeat path needs Origin and we want one
consistent header set across both calls.
"""
try:
import httpx
except ImportError:
# httpx is a transitive dep via a2a-sdk; if missing, the MCP
# server won't import either. Let the caller's later import
# surface the real error.
return
payload = {
"id": workspace_id,
"url": "",
"agent_card": build_agent_card(workspace_id),
"delivery_mode": "poll",
}
headers = {
"Authorization": f"Bearer {token}",
"Origin": platform_url,
"Content-Type": "application/json",
}
try:
with httpx.Client(timeout=10.0) as client:
resp = client.post(
f"{platform_url}/registry/register",
json=payload,
headers=headers,
)
if resp.status_code in (401, 403):
print(
f"molecule-mcp: register rejected with HTTP {resp.status_code}"
f"the token in MOLECULE_WORKSPACE_TOKEN is invalid for workspace "
f"{workspace_id}. Regenerate from the canvas → Tokens tab.",
file=sys.stderr,
)
sys.exit(3)
if resp.status_code >= 400:
logger.warning(
"molecule-mcp: register POST returned HTTP %d: %s",
resp.status_code,
(resp.text or "")[:200],
)
else:
logger.info(
"molecule-mcp: registered workspace %s with platform",
workspace_id,
)
except SystemExit:
raise
except Exception as exc: # noqa: BLE001
logger.warning("molecule-mcp: register POST failed: %s", exc)
def heartbeat_loop(
platform_url: str,
workspace_id: str,
token: str,
interval: float = HEARTBEAT_INTERVAL_SECONDS,
) -> None:
"""Daemon thread body: POST /registry/heartbeat every ``interval``s.
Failures are logged at WARNING and the loop continues. The thread
exits when the main process does (daemon=True). Each iteration
rebuilds the payload + headers — cheap and ensures token rotation
via env var (rare but possible) is picked up on the next tick.
"""
try:
import httpx
except ImportError:
return
start_time = time.time()
consecutive_auth_failures = 0
while True:
body = {
"workspace_id": workspace_id,
"error_rate": 0.0,
"sample_error": "",
"active_tasks": 0,
"uptime_seconds": int(time.time() - start_time),
}
headers = {
"Authorization": f"Bearer {token}",
"Origin": platform_url,
"Content-Type": "application/json",
}
try:
with httpx.Client(timeout=10.0) as client:
resp = client.post(
f"{platform_url}/registry/heartbeat",
json=body,
headers=headers,
)
if resp.status_code in (401, 403):
consecutive_auth_failures += 1
log_heartbeat_auth_failure(
consecutive_auth_failures, workspace_id, resp.status_code,
)
elif resp.status_code >= 400:
# Non-auth HTTP error — log, but DO NOT touch the
# auth-failure counter (5xx blips, 429, etc. are
# transient and unrelated to token validity).
logger.warning(
"molecule-mcp: heartbeat HTTP %d: %s",
resp.status_code,
(resp.text or "")[:200],
)
else:
consecutive_auth_failures = 0
persist_inbound_secret_from_heartbeat(resp)
except Exception as exc: # noqa: BLE001
logger.warning("molecule-mcp: heartbeat failed: %s", exc)
time.sleep(interval)
def log_heartbeat_auth_failure(count: int, workspace_id: str, status_code: int) -> None:
"""Escalate consecutive heartbeat 401/403s from quiet WARNING to
actionable ERROR.
The operator's first sign of trouble shouldn't be "tools 401 with no
explanation" — that was the failure mode that motivated this code,
triggered by a workspace being deleted server-side and its tokens
revoked while the runtime kept heartbeating in silence.
Cadence:
* count < threshold: WARNING per tick (transient — could be a
platform blip, don't shout yet)
* count == threshold: ERROR with re-onboard instructions
(the first signal the operator can't miss)
* count > threshold and (count - threshold) % relog == 0: re-log
ERROR (so a session that started after the first ERROR still
sees the message scrolling past in their logs)
"""
if count < HEARTBEAT_AUTH_LOUD_THRESHOLD:
logger.warning(
"molecule-mcp: heartbeat HTTP %d (auth failure %d/%d) — "
"token may be revoked. Will retry; if persistent, regenerate "
"from canvas → Tokens.",
status_code, count, HEARTBEAT_AUTH_LOUD_THRESHOLD,
)
return
# At or past the threshold — this is the loud actionable error.
if count == HEARTBEAT_AUTH_LOUD_THRESHOLD or (
count - HEARTBEAT_AUTH_LOUD_THRESHOLD
) % HEARTBEAT_AUTH_RELOG_INTERVAL == 0:
logger.error(
"molecule-mcp: %d consecutive heartbeat auth failures (HTTP %d) — "
"the token in MOLECULE_WORKSPACE_TOKEN has been REVOKED, likely "
"because workspace %s was deleted server-side. The MCP server is "
"still running but every platform call will fail. Regenerate the "
"workspace + token from the canvas (Tokens tab), update your MCP "
"config, and restart your runtime.",
count, status_code, workspace_id,
)
def persist_inbound_secret_from_heartbeat(resp: object) -> None:
"""Persist ``platform_inbound_secret`` from a heartbeat response, if any.
The platform's heartbeat handler returns the secret on every beat
(mirroring /registry/register) so a workspace that lazy-healed the
secret on the platform side — typical recovery path for a workspace
whose row had a NULL ``platform_inbound_secret`` after a partial
bootstrap — picks it up within one heartbeat tick instead of
requiring a runtime restart.
Without this delivery path the chat-upload code path's "secret was
just minted, will pick up on next heartbeat" 503 message is a lie
and the workspace stays 401-forever until the operator restarts
the runtime. Caught 2026-04-30 on hongmingwang tenant.
Failure is non-fatal: if the body isn't JSON, doesn't carry the
field, or the disk write fails, the next heartbeat retries. This
matches the cold-start register flow in main.py:319-323.
"""
try:
body = resp.json()
except Exception: # noqa: BLE001
return
if not isinstance(body, dict):
return
secret = body.get("platform_inbound_secret")
if not secret:
return
try:
from platform_inbound_auth import save_inbound_secret
save_inbound_secret(secret)
except Exception as exc: # noqa: BLE001
logger.warning(
"molecule-mcp: persist inbound secret from heartbeat failed: %s", exc
)
def start_heartbeat_thread(
platform_url: str,
workspace_id: str,
token: str,
) -> threading.Thread:
"""Start the heartbeat daemon thread. Returns the Thread handle.
The MCP stdio loop runs in the foreground (asyncio); this thread
runs alongside it. ``daemon=True`` so when the operator hits
Ctrl-C / closes the runtime, the heartbeat dies with it instead
of leaking and writing to a stale workspace.
"""
t = threading.Thread(
target=heartbeat_loop,
args=(platform_url, workspace_id, token),
name="molecule-mcp-heartbeat",
daemon=True,
)
t.start()
return t
+63
View File
@@ -0,0 +1,63 @@
"""Inbox-poller spawn helpers for the standalone ``molecule-mcp`` wrapper.
Extracted from ``mcp_cli.py`` (RFC #2873 iter 3). The poller is the
INBOUND side of the standalone path — without it, the universal MCP
server is outbound-only (can call ``delegate_task`` /
``send_message_to_user``, never observes canvas-user / peer-agent
messages).
Public surface:
* ``start_inbox_pollers(platform_url, workspace_ids)`` — activate the
inbox singleton and spawn one daemon poller per workspace.
"""
from __future__ import annotations
import logging
logger = logging.getLogger(__name__)
def start_inbox_pollers(platform_url: str, workspace_ids: list[str]) -> None:
"""Activate the inbox singleton + spawn one poller daemon thread per workspace.
Done lazily here (not at module import) because importing inbox
pulls in platform_auth, which only resolves cleanly AFTER env
validation succeeds. Activation is idempotent within a process,
so a stray double-call (e.g. test harness re-entering main) is
harmless.
The poller threads are daemon=True — die with the main process.
Single-workspace path: one poller, single cursor file at the legacy
location (``.mcp_inbox_cursor``). Cursor-key resolution falls back
to the empty string for back-compat with operators whose existing
on-disk cursor was written by the pre-multi-workspace code.
Multi-workspace path: N pollers, each with its own cursor file
keyed by ``workspace_id[:8]``. Cursors live next to each other in
configs_dir so an operator inspecting state sees all of them
together.
"""
try:
import inbox
except ImportError as exc:
logger.warning("molecule-mcp: inbox module unavailable: %s", exc)
return
if len(workspace_ids) <= 1:
# Back-compat exact: single-workspace mode reuses the legacy
# cursor filename + cursor_path constructor arg, so an existing
# operator's on-disk state isn't invalidated by upgrade.
wsid = workspace_ids[0]
state = inbox.InboxState(cursor_path=inbox.default_cursor_path())
inbox.activate(state)
inbox.start_poller_thread(state, platform_url, wsid)
return
# Multi-workspace: per-workspace cursor file, one shared queue.
cursor_paths = {wsid: inbox.default_cursor_path(wsid) for wsid in workspace_ids}
state = inbox.InboxState(cursor_paths=cursor_paths)
inbox.activate(state)
for wsid in workspace_ids:
inbox.start_poller_thread(state, platform_url, wsid)
+146
View File
@@ -0,0 +1,146 @@
"""Env validation + workspace resolution for the standalone ``molecule-mcp``.
Extracted from ``mcp_cli.py`` (RFC #2873 iter 3). Deals with the two
shapes ``molecule-mcp`` accepts:
* Single-workspace legacy shape: ``WORKSPACE_ID`` + token from
``MOLECULE_WORKSPACE_TOKEN`` or ``${CONFIGS_DIR}/.auth_token``.
* Multi-workspace JSON shape: ``MOLECULE_WORKSPACES`` env var carries a
JSON array of ``{"id": ..., "token": ...}`` entries.
Public surface:
* ``resolve_workspaces()`` → ``(workspaces, errors)``.
* ``read_token_file()`` → token text or ``""``.
* ``print_missing_env_help(missing, have_token_file)`` — operator-help
printer.
"""
from __future__ import annotations
import json
import os
import sys
import configs_dir
def resolve_workspaces() -> tuple[list[tuple[str, str]], list[str]]:
"""Return the list of ``(workspace_id, token)`` pairs to register.
Resolution order:
1. ``MOLECULE_WORKSPACES`` env var — JSON array of
``{"id": "...", "token": "..."}`` objects. Activates the
multi-workspace external-agent path (one process registered into
N workspaces). When set, ``WORKSPACE_ID`` / ``MOLECULE_WORKSPACE_TOKEN``
are IGNORED — the JSON is the source of truth.
2. Single-workspace fallback — ``WORKSPACE_ID`` env var + token from
``MOLECULE_WORKSPACE_TOKEN`` or ``${CONFIGS_DIR}/.auth_token``.
This is the pre-existing path; back-compat exact.
Returns ``(workspaces, errors)``:
* ``workspaces``: list of ``(workspace_id, token)`` — non-empty
on the happy path.
* ``errors``: human-readable strings describing what's missing /
malformed. ``main()`` surfaces these with the same shape as
``print_missing_env_help`` so the operator's first run gives
actionable output.
Why JSON env (not file): ergonomic for Claude Code MCP config (one
string in ``mcpServers.molecule.env`` instead of a sidecar file)
and for CI / launchers. A separate config-file path can be added
later without breaking this.
"""
raw = os.environ.get("MOLECULE_WORKSPACES", "").strip()
if raw:
try:
parsed = json.loads(raw)
except json.JSONDecodeError as exc:
return [], [
f"MOLECULE_WORKSPACES is not valid JSON ({exc.msg} at pos "
f"{exc.pos}). Expected: '[{{\"id\":\"<wsid>\",\"token\":"
f"\"<tok>\"}},{{...}}]'"
]
if not isinstance(parsed, list) or not parsed:
return [], [
"MOLECULE_WORKSPACES must be a non-empty JSON array of "
"{\"id\":\"...\",\"token\":\"...\"} objects"
]
out: list[tuple[str, str]] = []
seen: set[str] = set()
errors: list[str] = []
for i, entry in enumerate(parsed):
if not isinstance(entry, dict):
errors.append(
f"MOLECULE_WORKSPACES[{i}] is not an object — got {type(entry).__name__}"
)
continue
wsid = str(entry.get("id", "")).strip()
tok = str(entry.get("token", "")).strip()
if not wsid or not tok:
errors.append(
f"MOLECULE_WORKSPACES[{i}] missing 'id' or 'token'"
)
continue
if wsid in seen:
errors.append(
f"MOLECULE_WORKSPACES[{i}] duplicate workspace id {wsid!r}"
)
continue
seen.add(wsid)
out.append((wsid, tok))
if errors:
return [], errors
return out, []
# Single-workspace back-compat path.
wsid = os.environ.get("WORKSPACE_ID", "").strip()
if not wsid:
return [], ["WORKSPACE_ID (or MOLECULE_WORKSPACES) is required"]
tok = os.environ.get("MOLECULE_WORKSPACE_TOKEN", "").strip()
if not tok:
tok = read_token_file()
if not tok:
return [], [
"MOLECULE_WORKSPACE_TOKEN (or CONFIGS_DIR/.auth_token) is required"
]
return [(wsid, tok)], []
def print_missing_env_help(missing: list[str], have_token_file: bool) -> None:
print("molecule-mcp: missing required environment.\n", file=sys.stderr)
print("Set the following before running molecule-mcp:", file=sys.stderr)
print(" WORKSPACE_ID — your workspace UUID (from canvas)", file=sys.stderr)
print(
" PLATFORM_URL — base URL of your Molecule platform "
"(e.g. https://your-tenant.staging.moleculesai.app)",
file=sys.stderr,
)
if not have_token_file:
print(
" MOLECULE_WORKSPACE_TOKEN — bearer token for this workspace "
"(canvas → Tokens tab)",
file=sys.stderr,
)
print("", file=sys.stderr)
print(f"Currently missing: {', '.join(missing)}", file=sys.stderr)
def read_token_file() -> str:
"""Read the token from the resolved configs dir's ``.auth_token`` if
present.
Mirrors platform_auth._token_file's location resolution but without
importing the heavy module here (that import triggers a2a_client's
WORKSPACE_ID guard which is fine after env validation, but cheaper
to inline a 4-line file read than pull in the whole stack just for
the path).
"""
path = configs_dir.resolve() / ".auth_token"
if not path.is_file():
return ""
try:
return path.read_text().strip()
except OSError:
return ""
+281
View File
@@ -0,0 +1,281 @@
"""Direct tests for ``a2a_tools_rbac`` (RFC #2873 iter 4a).
The full behavior matrix is exercised through ``a2a_tools._foo`` aliases
in ``test_a2a_tools_impl.py``. This file pins:
1. **Drift gate** — ``a2a_tools._foo is a2a_tools_rbac.foo`` for every
extracted symbol. A refactor that wraps or re-implements an alias
fails this test.
2. **Direct unit coverage** for each helper without going through the
a2a_tools surface, so regressions in the small RBAC layer surface
against THIS module's tests, not the 991-LOC tool-handler tests.
"""
from __future__ import annotations
import os
import sys
from unittest.mock import patch
import pytest
@pytest.fixture(autouse=True)
def _require_workspace_id(monkeypatch):
# a2a_client raises at import-time without WORKSPACE_ID. Setting it
# once per test isolates the env so an absent value in CI doesn't
# surface as an opaque RuntimeError from a2a_tools' import.
monkeypatch.setenv("WORKSPACE_ID", "00000000-0000-0000-0000-000000000000")
monkeypatch.setenv("PLATFORM_URL", "http://test.invalid")
yield
# ============== Drift gate ==============
class TestBackCompatAliases:
"""Pin that every legacy underscore name in ``a2a_tools`` is the
EXACT same callable / object as the new public name in
``a2a_tools_rbac``. Catches accidental re-implementation in either
direction."""
def test_role_permissions_is_same_object(self):
import a2a_tools
import a2a_tools_rbac
assert a2a_tools._ROLE_PERMISSIONS is a2a_tools_rbac.ROLE_PERMISSIONS
def test_get_workspace_tier_alias(self):
import a2a_tools
import a2a_tools_rbac
assert a2a_tools._get_workspace_tier is a2a_tools_rbac.get_workspace_tier
def test_check_memory_write_permission_alias(self):
import a2a_tools
import a2a_tools_rbac
assert (
a2a_tools._check_memory_write_permission
is a2a_tools_rbac.check_memory_write_permission
)
def test_check_memory_read_permission_alias(self):
import a2a_tools
import a2a_tools_rbac
assert (
a2a_tools._check_memory_read_permission
is a2a_tools_rbac.check_memory_read_permission
)
def test_is_root_workspace_alias(self):
import a2a_tools
import a2a_tools_rbac
assert a2a_tools._is_root_workspace is a2a_tools_rbac.is_root_workspace
def test_auth_headers_alias(self):
import a2a_tools
import a2a_tools_rbac
assert (
a2a_tools._auth_headers_for_heartbeat
is a2a_tools_rbac.auth_headers_for_heartbeat
)
# ============== get_workspace_tier ==============
class TestGetWorkspaceTier:
def test_uses_config_when_available(self):
"""Happy path: load_config returns an object with .tier."""
import a2a_tools_rbac
class _Cfg:
tier = 0
with patch("config.load_config", return_value=_Cfg()):
assert a2a_tools_rbac.get_workspace_tier() == 0
def test_default_tier_when_config_lacks_attr(self):
import a2a_tools_rbac
class _Cfg:
pass
with patch("config.load_config", return_value=_Cfg()):
# getattr default = 1
assert a2a_tools_rbac.get_workspace_tier() == 1
def test_falls_back_to_env_var(self, monkeypatch):
"""When load_config raises, read WORKSPACE_TIER from env."""
import a2a_tools_rbac
monkeypatch.setenv("WORKSPACE_TIER", "5")
with patch("config.load_config", side_effect=RuntimeError("config unavailable")):
assert a2a_tools_rbac.get_workspace_tier() == 5
def test_fallback_default_one_when_env_unset(self, monkeypatch):
import a2a_tools_rbac
monkeypatch.delenv("WORKSPACE_TIER", raising=False)
with patch("config.load_config", side_effect=RuntimeError("boom")):
assert a2a_tools_rbac.get_workspace_tier() == 1
# ============== is_root_workspace ==============
class TestIsRootWorkspace:
def test_tier_zero_is_root(self):
import a2a_tools_rbac
with patch.object(a2a_tools_rbac, "get_workspace_tier", return_value=0):
assert a2a_tools_rbac.is_root_workspace() is True
def test_nonzero_tier_is_not_root(self):
import a2a_tools_rbac
for tier in (1, 2, 99):
with patch.object(a2a_tools_rbac, "get_workspace_tier", return_value=tier):
assert a2a_tools_rbac.is_root_workspace() is False, f"tier={tier}"
# ============== check_memory_write_permission ==============
class _RBACCfg:
"""Minimal config stub matching the load_config().rbac shape."""
def __init__(self, roles=None, allowed_actions=None):
class _RBAC:
pass
self.rbac = _RBAC()
self.rbac.roles = roles or ["operator"]
self.rbac.allowed_actions = allowed_actions or {}
class TestCheckMemoryWritePermission:
def test_admin_role_grants_write(self):
import a2a_tools_rbac
with patch("config.load_config", return_value=_RBACCfg(roles=["admin"])):
assert a2a_tools_rbac.check_memory_write_permission() is True
def test_operator_role_grants_write(self):
"""Operator is in the canonical ROLE_PERMISSIONS table with
memory.write — must work without per-role overrides."""
import a2a_tools_rbac
with patch("config.load_config", return_value=_RBACCfg(roles=["operator"])):
assert a2a_tools_rbac.check_memory_write_permission() is True
def test_read_only_role_denies_write(self):
import a2a_tools_rbac
with patch("config.load_config", return_value=_RBACCfg(roles=["read-only"])):
assert a2a_tools_rbac.check_memory_write_permission() is False
def test_per_role_override_grants(self):
"""Per-role override in allowed_actions wins over the canonical
table — operators can grant write to memory-readonly via config."""
import a2a_tools_rbac
cfg = _RBACCfg(
roles=["memory-readonly"],
allowed_actions={"memory-readonly": {"memory.read", "memory.write"}},
)
with patch("config.load_config", return_value=cfg):
assert a2a_tools_rbac.check_memory_write_permission() is True
def test_per_role_override_denies(self):
"""Per-role override that drops write blocks an operator from
writing — the override is the authoritative source when present."""
import a2a_tools_rbac
cfg = _RBACCfg(
roles=["operator"],
allowed_actions={"operator": {"memory.read"}},
)
with patch("config.load_config", return_value=cfg):
assert a2a_tools_rbac.check_memory_write_permission() is False
def test_fail_closed_when_config_unavailable(self):
"""Fail-closed contract: config outage falls back to ['operator']
with no overrides — operator has memory.write in the canonical
table, so write IS granted in this fallback. The fail-closed
property is for ELEVATED ops (admin scope), not for the basic
write that operator has by default. This test pins the contract:
config errors do not silently grant admin."""
import a2a_tools_rbac
with patch("config.load_config", side_effect=RuntimeError("boom")):
# operator has memory.write → True (preserved behavior)
assert a2a_tools_rbac.check_memory_write_permission() is True
# ============== check_memory_read_permission ==============
class TestCheckMemoryReadPermission:
def test_admin_grants_read(self):
import a2a_tools_rbac
with patch("config.load_config", return_value=_RBACCfg(roles=["admin"])):
assert a2a_tools_rbac.check_memory_read_permission() is True
def test_read_only_grants_read(self):
import a2a_tools_rbac
with patch("config.load_config", return_value=_RBACCfg(roles=["read-only"])):
assert a2a_tools_rbac.check_memory_read_permission() is True
def test_unknown_role_denies(self):
"""A role that's not in ROLE_PERMISSIONS and not in
allowed_actions overrides denies by default."""
import a2a_tools_rbac
with patch("config.load_config", return_value=_RBACCfg(roles=["random-undefined-role"])):
assert a2a_tools_rbac.check_memory_read_permission() is False
# ============== auth_headers_for_heartbeat ==============
class TestAuthHeadersForHeartbeat:
def test_no_workspace_id_uses_legacy_path(self):
"""No-arg call routes to platform_auth.auth_headers() — the
legacy single-token path."""
import a2a_tools_rbac
called: dict[str, object] = {}
def fake_auth_headers(*args):
called["args"] = args
return {"Authorization": "Bearer legacy-token"}
with patch("platform_auth.auth_headers", fake_auth_headers):
out = a2a_tools_rbac.auth_headers_for_heartbeat()
assert out == {"Authorization": "Bearer legacy-token"}
# Legacy path is auth_headers() with no arg
assert called["args"] == ()
def test_with_workspace_id_routes_per_workspace(self):
import a2a_tools_rbac
called: dict[str, object] = {}
def fake_auth_headers(wsid):
called["wsid"] = wsid
return {"Authorization": f"Bearer tok-{wsid}"}
with patch("platform_auth.auth_headers", fake_auth_headers):
out = a2a_tools_rbac.auth_headers_for_heartbeat("ws-abc")
assert out == {"Authorization": "Bearer tok-ws-abc"}
assert called["wsid"] == "ws-abc"
def test_returns_empty_when_platform_auth_missing(self, monkeypatch):
"""Older installs without platform_auth get {} so callers don't
crash — they'll just send unauthed and the platform 401 handler
surfaces the real error."""
import a2a_tools_rbac
# Force ImportError by setting sys.modules entry to None
monkeypatch.setitem(sys.modules, "platform_auth", None)
out = a2a_tools_rbac.auth_headers_for_heartbeat("ws-1")
assert out == {}
# ============== ROLE_PERMISSIONS canonical table ==============
class TestRolePermissionsTable:
def test_admin_has_all_actions(self):
import a2a_tools_rbac
assert a2a_tools_rbac.ROLE_PERMISSIONS["admin"] == {
"delegate", "approve", "memory.read", "memory.write",
}
def test_read_only_has_only_memory_read(self):
import a2a_tools_rbac
assert a2a_tools_rbac.ROLE_PERMISSIONS["read-only"] == {"memory.read"}
def test_no_delegation_is_missing_delegate(self):
import a2a_tools_rbac
assert "delegate" not in a2a_tools_rbac.ROLE_PERMISSIONS["no-delegation"]
def test_no_approval_is_missing_approve(self):
import a2a_tools_rbac
assert "approve" not in a2a_tools_rbac.ROLE_PERMISSIONS["no-approval"]
+14 -8
View File
@@ -13,6 +13,7 @@ from pathlib import Path
import pytest
import mcp_cli
import mcp_heartbeat
@pytest.fixture(autouse=True)
@@ -739,8 +740,13 @@ def test_heartbeat_loop_calls_persist_on_success(monkeypatch):
def fake_persist(resp):
saw.append(resp)
# Patch on mcp_heartbeat — that's where heartbeat_loop's internal
# name resolution looks up persist_inbound_secret_from_heartbeat
# after the RFC #2873 iter 3 split. The mcp_cli._persist_…_from_heartbeat
# back-compat re-export still exists, but patching it here would not
# affect the loop body.
monkeypatch.setattr(
mcp_cli, "_persist_inbound_secret_from_heartbeat", fake_persist
mcp_heartbeat, "persist_inbound_secret_from_heartbeat", fake_persist
)
class FakeResp:
@@ -786,8 +792,8 @@ def test_heartbeat_loop_skips_persist_on_4xx(monkeypatch):
"""Heartbeat 4xx error path must NOT invoke persist (no body to trust)."""
saw: list[object] = []
monkeypatch.setattr(
mcp_cli,
"_persist_inbound_secret_from_heartbeat",
mcp_heartbeat,
"persist_inbound_secret_from_heartbeat",
lambda r: saw.append(r),
)
@@ -899,7 +905,7 @@ def test_heartbeat_single_401_logs_warning_not_error(monkeypatch, caplog):
transient platform blip. Log at WARNING; don't shout."""
import logging
caplog.set_level(logging.WARNING, logger="mcp_cli")
caplog.set_level(logging.WARNING, logger="mcp_heartbeat")
_multi_iter_runner(monkeypatch, [401])
@@ -923,7 +929,7 @@ def test_heartbeat_three_consecutive_401s_escalates_to_error(monkeypatch, caplog
LOUD ERROR with re-onboard guidance — not buried at WARNING."""
import logging
caplog.set_level(logging.WARNING, logger="mcp_cli")
caplog.set_level(logging.WARNING, logger="mcp_heartbeat")
_multi_iter_runner(monkeypatch, [401, 401, 401])
@@ -949,7 +955,7 @@ def test_heartbeat_403_treated_same_as_401(monkeypatch, caplog):
not authorized for this workspace). Same escalation path."""
import logging
caplog.set_level(logging.WARNING, logger="mcp_cli")
caplog.set_level(logging.WARNING, logger="mcp_heartbeat")
_multi_iter_runner(monkeypatch, [403, 403, 403])
@@ -963,7 +969,7 @@ def test_heartbeat_recovery_resets_consecutive_counter(monkeypatch, caplog):
later should NOT immediately escalate."""
import logging
caplog.set_level(logging.WARNING, logger="mcp_cli")
caplog.set_level(logging.WARNING, logger="mcp_heartbeat")
# Two 401s, then 200, then one 401. If counter resets correctly,
# the final 401 is "1 consecutive" and should NOT escalate.
@@ -982,7 +988,7 @@ def test_heartbeat_500_does_not_increment_auth_counter(monkeypatch, caplog):
misleading the operator."""
import logging
caplog.set_level(logging.WARNING, logger="mcp_cli")
caplog.set_level(logging.WARNING, logger="mcp_heartbeat")
_multi_iter_runner(monkeypatch, [500, 500, 500])
+231
View File
@@ -0,0 +1,231 @@
"""RFC #2873 iter 3 — drift gate + behavior tests for the post-split surface.
The bulk of the heartbeat / resolver behavior is exercised by
``test_mcp_cli.py`` and ``test_mcp_cli_multi_workspace.py`` through the
``mcp_cli._symbol`` back-compat aliases. This file pins:
1. The split is **behavior-neutral via aliasing** — every previously-
exposed ``mcp_cli._foo`` symbol is the SAME callable as the new
module's authoritative function. If a refactor accidentally drops
an alias or points it at a stale copy, this fails.
2. ``mcp_inbox_pollers.start_inbox_pollers`` works for both single-
workspace (legacy back-compat) and multi-workspace shapes.
``mcp_cli`` had no direct test for this branch before the split.
"""
from __future__ import annotations
import sys
import types
import pytest
import mcp_cli
import mcp_heartbeat
import mcp_inbox_pollers
import mcp_workspace_resolver
# ============== Drift gate: back-compat aliases point at the real fn ==============
class TestBackCompatAliases:
"""Pin that ``mcp_cli._foo is real_fn``. A test that re-implements
the alias would still pass — the ``is`` check guarantees we didn't
create a wrapper that drifts."""
def test_heartbeat_aliases(self):
assert mcp_cli._build_agent_card is mcp_heartbeat.build_agent_card
assert mcp_cli._platform_register is mcp_heartbeat.platform_register
assert mcp_cli._heartbeat_loop is mcp_heartbeat.heartbeat_loop
assert mcp_cli._log_heartbeat_auth_failure is mcp_heartbeat.log_heartbeat_auth_failure
assert (
mcp_cli._persist_inbound_secret_from_heartbeat
is mcp_heartbeat.persist_inbound_secret_from_heartbeat
)
assert mcp_cli._start_heartbeat_thread is mcp_heartbeat.start_heartbeat_thread
def test_resolver_aliases(self):
assert mcp_cli._resolve_workspaces is mcp_workspace_resolver.resolve_workspaces
assert mcp_cli._print_missing_env_help is mcp_workspace_resolver.print_missing_env_help
assert mcp_cli._read_token_file is mcp_workspace_resolver.read_token_file
def test_inbox_pollers_alias(self):
assert mcp_cli._start_inbox_pollers is mcp_inbox_pollers.start_inbox_pollers
def test_constants_match(self):
assert (
mcp_cli.HEARTBEAT_INTERVAL_SECONDS
== mcp_heartbeat.HEARTBEAT_INTERVAL_SECONDS
)
assert (
mcp_cli._HEARTBEAT_AUTH_LOUD_THRESHOLD
== mcp_heartbeat.HEARTBEAT_AUTH_LOUD_THRESHOLD
)
assert (
mcp_cli._HEARTBEAT_AUTH_RELOG_INTERVAL
== mcp_heartbeat.HEARTBEAT_AUTH_RELOG_INTERVAL
)
# ============== mcp_inbox_pollers — both shapes + degraded import ==============
class _FakeInboxState:
def __init__(self, **kwargs):
self.kwargs = kwargs
def _install_fake_inbox(monkeypatch):
"""Inject a fake ``inbox`` module so we observe the spawn calls
without pulling in the real platform_auth dependency tree."""
activations: list[_FakeInboxState] = []
spawned: list[tuple[_FakeInboxState, str, str]] = []
cursor_paths: list[str] = []
def default_cursor_path(wsid=None):
# Mirror the real signature: optional wsid → distinct path per id,
# absent → legacy single path.
path = f"/tmp/.mcp_inbox_cursor.{wsid[:8]}" if wsid else "/tmp/.mcp_inbox_cursor"
cursor_paths.append(path)
return path
def activate(state):
activations.append(state)
def start_poller_thread(state, platform_url, wsid):
spawned.append((state, platform_url, wsid))
fake = types.ModuleType("inbox")
fake.InboxState = _FakeInboxState
fake.activate = activate
fake.default_cursor_path = default_cursor_path
fake.start_poller_thread = start_poller_thread
monkeypatch.setitem(sys.modules, "inbox", fake)
return activations, spawned, cursor_paths
class TestStartInboxPollers:
def test_single_workspace_uses_legacy_cursor_path(self, monkeypatch):
"""Back-compat exact: single-workspace mode reuses the legacy
cursor filename so an existing operator's on-disk state isn't
invalidated by upgrade."""
activations, spawned, cursor_paths = _install_fake_inbox(monkeypatch)
mcp_inbox_pollers.start_inbox_pollers(
"https://test.moleculesai.app", ["ws-only-one"]
)
assert len(activations) == 1, "exactly one inbox.activate call"
assert len(spawned) == 1, "exactly one poller thread spawned"
# Single-workspace path uses default_cursor_path() with no arg —
# the cursor_path captured here must be the legacy filename
# (no per-ws suffix).
assert cursor_paths == ["/tmp/.mcp_inbox_cursor"]
# State carries cursor_path, not cursor_paths
state = activations[0]
assert state.kwargs == {"cursor_path": "/tmp/.mcp_inbox_cursor"}
# Spawned poller is for the right workspace
assert spawned[0] == (state, "https://test.moleculesai.app", "ws-only-one")
def test_multi_workspace_uses_per_workspace_cursor_paths(self, monkeypatch):
"""Multi-workspace path: per-workspace cursor file, one shared
InboxState. N pollers, each pointed at the same state so the
agent's inbox_peek/pop sees a merged view."""
activations, spawned, _ = _install_fake_inbox(monkeypatch)
wsids = ["ws-aaaaaaaa", "ws-bbbbbbbb", "ws-cccccccc"]
mcp_inbox_pollers.start_inbox_pollers(
"https://test.moleculesai.app", wsids
)
# One state, one activate, three pollers
assert len(activations) == 1
assert len(spawned) == 3
state = activations[0]
# Multi-workspace state carries cursor_paths (mapping)
assert "cursor_paths" in state.kwargs
assert set(state.kwargs["cursor_paths"].keys()) == set(wsids)
# All pollers share the same state
for s, _url, _wsid in spawned:
assert s is state
# All workspace ids covered
assert sorted(t[2] for t in spawned) == sorted(wsids)
def test_inbox_module_unavailable_logs_and_returns(self, monkeypatch, caplog):
"""If ``import inbox`` fails (older install or stripped
runtime), spawn must NOT raise — log a warning and continue.
The MCP server can still serve outbound tools."""
import logging
# Force ImportError by injecting a module sentinel that raises.
class _Boom:
def __getattr__(self, _name):
raise ImportError("inbox stripped from this build")
# Setting sys.modules["inbox"] to a broken object isn't enough —
# the import statement reads sys.modules first; if the entry is
# truthy, Python returns it. We need to force the import to raise.
# Easiest: pre-poison sys.modules so the `import inbox` line
# raises by setting the entry to None (Python special-cases None
# as "explicit ImportError").
monkeypatch.setitem(sys.modules, "inbox", None)
caplog.set_level(logging.WARNING, logger="mcp_inbox_pollers")
# Should not raise.
mcp_inbox_pollers.start_inbox_pollers(
"https://test.moleculesai.app", ["ws-1"]
)
warnings = [r for r in caplog.records if r.levelno == logging.WARNING]
assert any("inbox module unavailable" in r.message for r in warnings), (
f"expected a 'inbox module unavailable' warning, got: "
f"{[r.message for r in warnings]}"
)
# ============== mcp_heartbeat.build_agent_card — short direct tests ==============
class TestBuildAgentCardDirect:
"""Spot-check the new module's public surface; the full test matrix
lives in ``test_mcp_cli.py`` reaching through ``mcp_cli._build_agent_card``.
"""
def test_default_card_shape(self, monkeypatch):
for v in ("MOLECULE_AGENT_NAME", "MOLECULE_AGENT_DESCRIPTION", "MOLECULE_AGENT_SKILLS"):
monkeypatch.delenv(v, raising=False)
card = mcp_heartbeat.build_agent_card("8dad3e29-c32a-4ec7-9ea7-94fe2d2d98ec")
assert card == {"name": "molecule-mcp-8dad3e29", "skills": []}
def test_skills_csv_split_and_trim(self, monkeypatch):
monkeypatch.setenv("MOLECULE_AGENT_SKILLS", "research, , code-review,memory-curation, ")
card = mcp_heartbeat.build_agent_card("ws-1")
assert card["skills"] == [
{"name": "research"},
{"name": "code-review"},
{"name": "memory-curation"},
]
# ============== mcp_workspace_resolver — short direct tests ==============
class TestResolveWorkspacesDirect:
@pytest.fixture(autouse=True)
def _isolate(self, monkeypatch, tmp_path):
for v in ("WORKSPACE_ID", "MOLECULE_WORKSPACE_TOKEN", "MOLECULE_WORKSPACES"):
monkeypatch.delenv(v, raising=False)
monkeypatch.setenv("CONFIGS_DIR", str(tmp_path))
yield
def test_single_workspace_via_env(self, monkeypatch):
monkeypatch.setenv("WORKSPACE_ID", "ws-1")
monkeypatch.setenv("MOLECULE_WORKSPACE_TOKEN", "tok")
out, errors = mcp_workspace_resolver.resolve_workspaces()
assert out == [("ws-1", "tok")]
assert errors == []
def test_multi_workspace_via_json_env(self, monkeypatch):
monkeypatch.setenv(
"MOLECULE_WORKSPACES",
'[{"id":"ws-a","token":"a"},{"id":"ws-b","token":"b"}]',
)
out, errors = mcp_workspace_resolver.resolve_workspaces()
assert out == [("ws-a", "a"), ("ws-b", "b")]
assert errors == []