Compare commits

...

11 Commits

Author SHA1 Message Date
claude-ceo-assistant 249e760fbd feat(plugins): hot-reload classifier — skip restart on SKILL-content-only updates
Closes molecule-core#112. Composes with #114 (atomic install).

Before issuing restartFunc, classify the diff between staged and live:
  - skill-content-only: only **/SKILL.md content changed
                        → skip restart (Claude Code re-reads SKILL.md on
                          each Skill invocation; no in-memory cache)
  - cold: anything else
                        → restartFunc as before
                          (hooks/settings load at session start;
                          plugin.yaml is structural; added/removed files
                          require a fresh load)

DETECTION
  - Hash every regular file in staged tree (host filesystem, sha256)
  - Hash every regular file in live tree (in-container via docker exec
    sh -c 'cd <livePath> && find . -type f -print0 | xargs -0 sha256sum')
  - .complete marker dropped from comparison (mtime varies install-to-
    install; including it would force-cold every reinstall)
  - File added/removed → cold
  - File content differs but isn't SKILL.md → cold
  - All differences are SKILL.md basenames → skill-content-only

DEFAULTS COLD
  - First install (no live tree) → cold
  - Live tree read failure → cold (conservative; never hot-reload speculatively)
  - Symlinks skipped during hash (same posture as tar walker)

PHASE 4 SELF-REVIEW
  Correctness: No finding — all error paths default to cold; never
    falsely classify as skill-content-only. The .complete drop is
    a deliberate exception (the marker is bookkeeping, not content).
  Readability: No finding — single-purpose helpers (hashLocalTree,
    hashContainerTree, isSkillMarkdown, shQuote) each do one thing.
    The classifier itself reads as 'compare set, then walk diff with
    isSkillMarkdown gate.'
  Architecture: No finding — composes existing execAsRoot primitive;
    new helpers in plugins_classifier.go don't touch any other
    handler. Old behavior unchanged when live read fails.
  Security: No finding — shQuote single-quotes any non-trivial path,
    pluginName comes from validatePluginName-validated source, and
    the docker exec command takes the path as a single arg (xargs -0
    handles binary-safe path delimiting). Symlinks skipped.
  Performance: No finding — adds two tree walks (host + container)
    per install. Container walk is one docker exec call returning
    sha256 lines; for typical plugins (~10-50 files) round-trip is
    ~100ms. Versus the saved ~5-10s of restart on a hot-reloadable
    update, this is a clear win.

TESTS (4 new, all green; full handler suite green)
  TestIsSkillMarkdown        — basename match, case-sensitive
  TestHashLocalTree_StableHash — re-hash same dir = same map
  TestHashLocalTree_SymlinkSkipped — hostile link doesn't poison classifier
  TestShQuote                — quoting boundary for shell injection safety

REFS
  molecule-core#112 — this issue
  molecule-core#114 — atomic install (.complete marker added there)
  Reno-Stars iteration safety (Hongming 2026-05-08)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 08:26:05 -07:00
claude-ceo-assistant 3e96184d6f Merge pull request 'feat(plugins): atomic install — stage→snapshot→swap→marker (docker path)' (#120) from feat/plugin-atomic-install into main 2026-05-08 15:23:31 +00:00
claude-ceo-assistant d543138bde Merge pull request 'chore: promote 5 staging-only feature PRs to main (Phase 3 of internal#81)' (#108) from chore/promote-staging-features-to-main into main 2026-05-08 15:22:12 +00:00
claude-ceo-assistant bfcb0fc445 Merge branch 'main' into chore/promote-staging-features-to-main 2026-05-08 15:21:18 +00:00
claude-ceo-assistant 2752a217c8 Merge pull request 'fix(pendinguploads): wait for error metric before test exit' (#111) from fix/pendinguploads-test-isolation into main 2026-05-08 15:21:08 +00:00
claude-ceo-assistant c3686a4bb3 Merge branch 'main' into fix/pendinguploads-test-isolation 2026-05-08 15:20:36 +00:00
dev-lead 9e18ab4620 fix(pendinguploads): wait for error metric before test exit
TestStartSweeper_TransientErrorDoesNotCrashLoop leaks an in-flight
metric write across the test boundary: cycleDone fires inside the
fake's Sweep defer (before Sweep returns), waitForCycle returns
immediately after, cancel() lands, but the goroutine still has
metrics.PendingUploadsSweepError() to execute. Whether that write
happens before or after the next test's metricDelta() baseline read
is a coin-flip on slow CI hosts.

Outcome: TestStartSweeper_RecordsMetricsOnSuccess fails with
"error counter delta = 1, want 0" — looks like a real bug, isn't.
Instrumented analysis (per the file's existing waitForMetricDelta
docstring covering the same shape) confirms the metric IS getting
recorded, just AFTER the next test reads its baseline.

The Records* tests already use waitForMetricDelta to close this race
on their own assertions. This change extends the same shape to
TransientErrorDoesNotCrashLoop so it doesn't poison subsequent tests'
baselines.

Verified by running `go test -race -count=20 ./internal/pendinguploads/...`
locally — passes deterministically.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 07:37:45 -07:00
claude-ceo-assistant ff8cc48340 ci: retrigger after AUTO_SYNC_TOKEN rotated to devops-engineer (was 401 against any repo) 2026-05-08 14:16:27 +00:00
claude-ceo-assistant c5669aa304 ci: retrigger after operator disk freed (was ENOSPC during harness boot) 2026-05-08 14:00:14 +00:00
claude-ceo-assistant bbfcaedece ci: retrigger after harness-tenant-alpha unhealthy on first run
Harness Replays job failed at "dependency failed to start: container
harness-tenant-alpha-1 is unhealthy" — that is not caused by this
merge (which adds workspace-server/internal/handlers code, not
container infra). Retry to confirm it was a transient environmental
issue (likely operator-host load/disk per internal#78).
2026-05-08 13:31:27 +00:00
claude-ceo-assistant 2597511d7b chore: promote 5 staging-only feature PRs to main (Phase 3 of internal#81)
This was supposed to fast-forward when each PR merged on staging,
but auto-promote-staging.yml has not been firing reliably on Gitea
since the GitHub suspension. Result: main is missing 5 substantive
feature PRs that landed on staging between 2026-04-29 and 2026-05-07:

  - #102: test(org-include) symlink-based subtree composition contract
  - #103: test(local-e2e) dev-department extraction end-to-end
  - #104: fix(provisioner)+test EvalSymlinks templatePath; stage-2 e2e
  - #105: feat(org-import) !external cross-repo subtree resolver (#222)
  - #106: test(org-external) integration + e2e for !external resolver

Each PR was independently reviewed and CI-green at staging-merge time;
this commit promotes the merged state atomically. Use git log on main
after the merge to see the original PR-merge commits preserved.

Sister work: Phase 3 of internal#81 (trunk-based migration). Workflow
trigger updates land in a follow-up PR; staging-branch deletion happens
after a no-op verification deploy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 13:07:22 +00:00
4 changed files with 364 additions and 11 deletions
@@ -0,0 +1,214 @@
package handlers
// plugins_classifier.go — diff classifier for plugin updates.
//
// Closes molecule-core#112. Composes with #114 (atomic install) so the
// platform can decide *before* triggering restartFunc whether the
// update is content-only (SKILL.md text changed; agent re-reads at next
// Skill invocation) or structural (hooks/settings/plugin.yaml/file added
// or removed; agent must restart to pick up the new state).
//
// SKILL.md content is hot-reloadable because Claude Code reads the file
// on each Skill invocation — no in-memory cache. Hooks and settings.json
// are loaded at session start and need a session restart. plugin.yaml
// changes are structural by definition (manifest controls everything
// else).
//
// CLASSIFICATION RULE
// classify(staged, live) → "skill-content-only" if and only if
// every file present in either tree is one of:
// - identical between staged and live, OR
// - a **/SKILL.md file with content change (text body modified)
// AND no files were added or removed.
// Anything else → "cold" (the safe default).
//
// The classifier reads live-tree files from inside the container via
// `docker exec cat`. Comparison is by SHA-256 over file content, not
// mtime — mtime changes on every install regardless of content.
import (
"context"
"crypto/sha256"
"encoding/hex"
"fmt"
"io/fs"
"os"
"path/filepath"
"strings"
)
const (
// classifyKindSkillContentOnly: install can skip restartFunc; the
// only changes are SKILL.md body text.
classifyKindSkillContentOnly = "skill-content-only"
// classifyKindCold: must restart the workspace container; structural
// or hook/settings change.
classifyKindCold = "cold"
)
// classifyInstallChanges compares the staged plugin tree (host filesystem)
// against the currently-live plugin tree inside the container. Returns
// classifyKindSkillContentOnly when the only diff is SKILL.md content
// changes, classifyKindCold otherwise (added/removed files, hooks/
// settings.json edits, plugin.yaml edits, anything else).
//
// `noLive` is the sentinel returned when /configs/plugins/<name> doesn't
// exist (first install for this plugin). Treated as cold — no live state
// to hot-reload into.
func (h *PluginsHandler) classifyInstallChanges(
ctx context.Context, containerName, hostStagedDir, pluginName string,
) (string, error) {
livePath := "/configs/plugins/" + pluginName
// Probe: does live exist? If not, this is a first install — cold.
if _, err := h.execAsRoot(ctx, containerName, []string{
"test", "-d", livePath,
}); err != nil {
return classifyKindCold, nil
}
// Build hash maps for both trees.
stagedHashes, err := hashLocalTree(hostStagedDir)
if err != nil {
return classifyKindCold, fmt.Errorf("classifier: hash staged: %w", err)
}
liveHashes, err := h.hashContainerTree(ctx, containerName, livePath)
if err != nil {
// Live tree read failure: be conservative, cold-restart.
return classifyKindCold, nil
}
// Drop the .complete marker from comparison — its mtime/atime can
// vary across installs but content is empty/trivial; including it
// would force-cold every reinstall.
delete(stagedHashes, ".complete")
delete(liveHashes, ".complete")
// Set difference: any file in one but not the other → cold.
for rel := range stagedHashes {
if _, ok := liveHashes[rel]; !ok {
return classifyKindCold, nil // file added
}
}
for rel := range liveHashes {
if _, ok := stagedHashes[rel]; !ok {
return classifyKindCold, nil // file removed
}
}
// Same set of files. Walk the diff.
for rel, stagedHash := range stagedHashes {
liveHash := liveHashes[rel]
if stagedHash == liveHash {
continue
}
// Content differs. Allow if and only if it's a SKILL.md.
if !isSkillMarkdown(rel) {
return classifyKindCold, nil
}
}
return classifyKindSkillContentOnly, nil
}
// isSkillMarkdown returns true for any path whose basename is SKILL.md
// (case-sensitive, matches Claude Code's skill discovery rule).
func isSkillMarkdown(rel string) bool {
return filepath.Base(rel) == "SKILL.md"
}
// hashLocalTree walks a host directory and returns rel-path → sha256-hex.
// Symlinks are skipped (same posture as the tar walker).
func hashLocalTree(root string) (map[string]string, error) {
out := map[string]string{}
err := filepath.WalkDir(root, func(p string, d fs.DirEntry, err error) error {
if err != nil {
return err
}
if d.IsDir() {
return nil
}
info, err := d.Info()
if err != nil {
return err
}
if info.Mode()&os.ModeSymlink != 0 {
return nil
}
if !info.Mode().IsRegular() {
return nil
}
rel, err := filepath.Rel(root, p)
if err != nil {
return err
}
body, err := os.ReadFile(p)
if err != nil {
return err
}
sum := sha256.Sum256(body)
out[filepath.ToSlash(rel)] = hex.EncodeToString(sum[:])
return nil
})
if err != nil {
return nil, err
}
return out, nil
}
// hashContainerTree reads every regular file under livePath via docker
// exec sh -c 'cd <livePath> && find . -type f -not -name .complete | xargs -I {} sh -c "echo {}; sha256sum {}"'.
//
// The output is parsed line-by-line into rel-path → sha256-hex.
func (h *PluginsHandler) hashContainerTree(
ctx context.Context, containerName, livePath string,
) (map[string]string, error) {
out, err := h.execAsRoot(ctx, containerName, []string{
"sh", "-c",
// Find regular files, hash each, output `<hex> ./<relpath>`.
// `cd` then `find .` keeps paths relative to livePath.
fmt.Sprintf("cd %s && find . -type f -print0 | xargs -0 -r sha256sum 2>/dev/null", shQuote(livePath)),
})
if err != nil {
return nil, fmt.Errorf("hash container tree: %w", err)
}
hashes := map[string]string{}
for _, line := range strings.Split(out, "\n") {
line = strings.TrimSpace(line)
if line == "" {
continue
}
// sha256sum output: "<hex> <path>" (two spaces). Path starts with "./".
parts := strings.SplitN(line, " ", 2)
if len(parts) != 2 {
continue
}
hash := parts[0]
rel := strings.TrimPrefix(parts[1], "./")
hashes[rel] = hash
}
return hashes, nil
}
// shQuote single-quotes a string for safe insertion into a shell command.
// Returns the input unchanged if it's already shell-safe (alphanumeric +
// /._-). Otherwise wraps in single quotes and escapes inner '.
func shQuote(s string) string {
safe := true
for _, c := range s {
switch {
case c >= 'a' && c <= 'z':
case c >= 'A' && c <= 'Z':
case c >= '0' && c <= '9':
case c == '/' || c == '.' || c == '_' || c == '-':
default:
safe = false
}
if !safe {
break
}
}
if safe {
return s
}
return "'" + strings.ReplaceAll(s, "'", `'\''`) + "'"
}
@@ -0,0 +1,121 @@
package handlers
import (
"os"
"path/filepath"
"testing"
)
// TestIsSkillMarkdown: pin which paths the classifier considers
// hot-reloadable. SKILL.md by basename only — case-sensitive.
func TestIsSkillMarkdown(t *testing.T) {
yes := []string{
"SKILL.md",
"skills/foo/SKILL.md",
"deeply/nested/SKILL.md",
}
no := []string{
"plugin.yaml",
"hooks.json",
"settings.json",
"README.md",
"skill.md", // case-sensitive
"SKILLS.md", // not a skill file
"skills/foo/extra.md",
}
for _, s := range yes {
if !isSkillMarkdown(s) {
t.Errorf("isSkillMarkdown(%q) = false; want true", s)
}
}
for _, s := range no {
if isSkillMarkdown(s) {
t.Errorf("isSkillMarkdown(%q) = true; want false", s)
}
}
}
// TestHashLocalTree_StableHash: hashing the same content twice must
// produce identical maps. Pinned because if hashLocalTree ever picks up
// mtime/inode (e.g. via a refactor to use os.Lstat metadata), every
// install would classify as cold and we'd lose the hot-reload.
func TestHashLocalTree_StableHash(t *testing.T) {
dir := t.TempDir()
if err := os.MkdirAll(filepath.Join(dir, "skills/foo"), 0o755); err != nil {
t.Fatal(err)
}
if err := os.WriteFile(filepath.Join(dir, "plugin.yaml"), []byte("name: foo\n"), 0o644); err != nil {
t.Fatal(err)
}
if err := os.WriteFile(filepath.Join(dir, "skills/foo/SKILL.md"), []byte("# Foo\n"), 0o644); err != nil {
t.Fatal(err)
}
h1, err := hashLocalTree(dir)
if err != nil {
t.Fatal(err)
}
h2, err := hashLocalTree(dir)
if err != nil {
t.Fatal(err)
}
if len(h1) != len(h2) {
t.Fatalf("hash count differs: %d vs %d", len(h1), len(h2))
}
for k, v := range h1 {
if h2[k] != v {
t.Errorf("hash[%q] differs: %q vs %q", k, v, h2[k])
}
}
}
// TestHashLocalTree_SymlinkSkipped: symlinks should not appear in the
// hash map — same posture as the tar walker. Otherwise a hostile plugin
// could include a symlink whose hash changes when its target changes,
// silently flipping classification.
func TestHashLocalTree_SymlinkSkipped(t *testing.T) {
dir := t.TempDir()
if err := os.WriteFile(filepath.Join(dir, "real.txt"), []byte("ok"), 0o644); err != nil {
t.Fatal(err)
}
target := filepath.Join(t.TempDir(), "target")
if err := os.WriteFile(target, []byte("outside"), 0o644); err != nil {
t.Fatal(err)
}
if err := os.Symlink(target, filepath.Join(dir, "link")); err != nil {
t.Fatal(err)
}
h, err := hashLocalTree(dir)
if err != nil {
t.Fatal(err)
}
if _, exists := h["link"]; exists {
t.Errorf("symlink leaked into hash map: %v", h)
}
if _, exists := h["real.txt"]; !exists {
t.Errorf("real.txt missing from hash map: %v", h)
}
}
// TestShQuote: the classifier injects livePath into a shell command via
// docker exec. Path must be quoted to handle pluginName entries with
// hyphens (which are safe but exercised here) and any future special-
// character edge case. Pin the safe-vs-quoted boundary.
func TestShQuote(t *testing.T) {
cases := []struct {
in, want string
}{
{"foo", "foo"},
{"/configs/plugins/foo-bar", "/configs/plugins/foo-bar"},
{"with space", "'with space'"},
{"with'quote", "'with'\\''quote'"},
{"$envvar", "'$envvar'"},
{"path/with/dots.txt", "path/with/dots.txt"},
}
for _, tc := range cases {
if got := shQuote(tc.in); got != tc.want {
t.Errorf("shQuote(%q) = %q; want %q", tc.in, got, tc.want)
}
}
}
@@ -276,6 +276,15 @@ func (h *PluginsHandler) resolveAndStage(ctx context.Context, req installRequest
// using NewPluginsHandler without a DB; production wires it in router.go.
func (h *PluginsHandler) deliverToContainer(ctx context.Context, workspaceID string, r *stageResult) error {
if containerName := h.findRunningContainer(ctx, workspaceID); containerName != "" {
// Hot-reload classifier (molecule-core#112) — decide BEFORE the
// install whether this update can skip restartFunc. SKILL.md
// content changes are filesystem-visible to Claude Code on the
// next Skill invocation; hooks / settings.json / plugin.yaml /
// added-or-removed files need a container restart.
// Classifier reads live tree from container; on any read error
// it returns kindCold so we never hot-reload speculatively.
kind, _ := h.classifyInstallChanges(ctx, containerName, r.StagedDir, r.PluginName)
// Atomic stage→snapshot→swap→marker (molecule-core#114).
// Replaces the prior single docker.CopyToContainer write that
// left a partially-extracted tree on mid-install failure with
@@ -290,7 +299,11 @@ func (h *PluginsHandler) deliverToContainer(ctx context.Context, workspaceID str
"chown", "-R", "1000:1000", "/configs/plugins/" + r.PluginName,
})
if h.restartFunc != nil {
go h.restartFunc(workspaceID)
if kind == classifyKindSkillContentOnly {
log.Printf("Plugin install: %s → workspace %s — SKILL-content-only update, SKIPPING restart", r.PluginName, workspaceID)
} else {
go h.restartFunc(workspaceID)
}
}
return nil
}
@@ -207,20 +207,25 @@ func TestStartSweeper_TransientErrorDoesNotCrashLoop(t *testing.T) {
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
// 50ms ticker so the second cycle fires quickly enough for the test.
// We re-export SweepInterval as a const, but tests use the public
// StartSweeper that takes its own interval — wait, the public
// StartSweeper signature uses the package-level SweepInterval. Hmm,
// this means the test takes ~5 minutes. Let me reconsider.
//
// (We patch the test below to just look at the immediate-sweep call
// + an error path, since the immediate call is enough to prove the
// "error doesn't crash" contract — the loop continues afterward
// regardless of timing.)
// Capture metric baseline so we can wait for the error counter to
// settle before returning — otherwise this test's leaked metric
// write races with the next test's metricDelta() baseline read and
// causes a non-deterministic +1 leak (manifests as
// TestStartSweeper_RecordsMetricsOnSuccess: "error counter delta=1,
// want 0"). cycleDone fires inside the fake's Sweep defer, BEFORE
// sweepOnce records the error metric — so cancel() right after
// waitForCycle is too early.
_, _, deltaError := metricDelta(t)
go pendinguploads.StartSweeper(ctx, store, time.Hour)
// Wait for the first (errored) cycle.
store.waitForCycle(t, 1, 2*time.Second)
// Wait for the goroutine to record the error metric. After this
// returns, sweepOnce has fully completed and a subsequent cancel()
// stops the loop on the next select pass with no in-flight metric
// writes outstanding.
waitForMetricDelta(t, deltaError, 1, 2*time.Second)
// Cancel — the goroutine returns cleanly, proving the error path
// didn't crash the loop. Without this fix the goroutine would have
// either panicked (process abort visible at exit) or stuck (this