fix(canvas): skip config.yaml write for openclaw + bump request timeout to 35s #1237

Merged
devops-engineer merged 1 commits from fix/openclaw-skip-config-write-and-canvas-timeout into staging 2026-05-15 21:58:41 +00:00
Member

Why

Canvas "Save & Restart" was timing out for openclaw workspaces because two bugs compounded:

1. Pointless config.yaml write

openclaw manages its own prompt surface via SOUL/BOOTSTRAP/AGENTS multi-file system — it does NOT read the platform's config.yaml. But ConfigTab.tsx was still issuing PUT /workspaces/:id/files/config.yaml on every save, which on tenant EC2 fans out through the slow EIC SSH tunnel path (workspace-server/internal/handlers/template_files_eic.go).

Other runtimes that ship their own config (external, kimi, kimi-cli) are already exempted via RUNTIMES_WITH_OWN_CONFIG. This PR adds openclaw to that set so the platform stops doing work the runtime ignores.

2. Client aborts before server returns

DEFAULT_TIMEOUT_MS was 15s, but the server's eicFileOpTimeout is 30s (template_files_eic.go L118). When EIC was slow or the EC2's ec2-instance-connect daemon was unhealthy, the canvas aborted with a generic timeout BEFORE the workspace-server returned its real 5xx — so the user saw a useless "request timed out" instead of the actual cause.

Raise the default to 35s so the server's error surfaces. The AbortController contract is unchanged; callers can still override timeoutMs per-request.

What

  • canvas/src/components/tabs/ConfigTab.tsx: add "openclaw" to RUNTIMES_WITH_OWN_CONFIG.
  • canvas/src/lib/api.ts: DEFAULT_TIMEOUT_MS 15_000 → 35_000, with an updated comment citing the server-side eicFileOpTimeout so the rationale survives the next read.

Test plan

  • Static: grep confirms RUNTIMES_WITH_OWN_CONFIG is consulted before issuing the config.yaml write in ConfigTab.tsx.
  • Static: server's eicFileOpTimeout = 30 * time.Second confirmed in workspace-server/internal/handlers/template_files_eic.go:118.
  • Post-deploy: open openclaw workspace on prod canvas → "Save & Restart" → expect no client-side timeout; no spurious PUT /files/config.yaml in network panel.
  • Post-deploy: smoke a slow workspace (non-openclaw) → expect either success within 35s or a real server 5xx surfaced to the UI instead of generic "timed out".

Follow-up (separate issue)

The underlying EIC hang on i-04e5197e96adb888f (last_healthcheck_at IS NULL) is tracked separately. This PR makes the canvas honest about errors instead of swallowing them, and removes the unnecessary write from openclaw's critical path entirely.

Refs: internal#418

Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com

## Why Canvas "Save & Restart" was timing out for openclaw workspaces because two bugs compounded: ### 1. Pointless `config.yaml` write `openclaw` manages its own prompt surface via SOUL/BOOTSTRAP/AGENTS multi-file system — it does NOT read the platform's `config.yaml`. But `ConfigTab.tsx` was still issuing `PUT /workspaces/:id/files/config.yaml` on every save, which on tenant EC2 fans out through the slow EIC SSH tunnel path (`workspace-server/internal/handlers/template_files_eic.go`). Other runtimes that ship their own config (`external`, `kimi`, `kimi-cli`) are already exempted via `RUNTIMES_WITH_OWN_CONFIG`. This PR adds `openclaw` to that set so the platform stops doing work the runtime ignores. ### 2. Client aborts before server returns `DEFAULT_TIMEOUT_MS` was 15s, but the server's `eicFileOpTimeout` is 30s (`template_files_eic.go` L118). When EIC was slow or the EC2's `ec2-instance-connect` daemon was unhealthy, the canvas aborted with a generic timeout BEFORE the workspace-server returned its real 5xx — so the user saw a useless "request timed out" instead of the actual cause. Raise the default to 35s so the server's error surfaces. The `AbortController` contract is unchanged; callers can still override `timeoutMs` per-request. ## What - `canvas/src/components/tabs/ConfigTab.tsx`: add `"openclaw"` to `RUNTIMES_WITH_OWN_CONFIG`. - `canvas/src/lib/api.ts`: `DEFAULT_TIMEOUT_MS` 15_000 → 35_000, with an updated comment citing the server-side `eicFileOpTimeout` so the rationale survives the next read. ## Test plan - [x] Static: `grep` confirms `RUNTIMES_WITH_OWN_CONFIG` is consulted before issuing the config.yaml write in `ConfigTab.tsx`. - [x] Static: server's `eicFileOpTimeout = 30 * time.Second` confirmed in `workspace-server/internal/handlers/template_files_eic.go:118`. - [ ] Post-deploy: open openclaw workspace on prod canvas → "Save & Restart" → expect no client-side timeout; no spurious `PUT /files/config.yaml` in network panel. - [ ] Post-deploy: smoke a slow workspace (non-openclaw) → expect either success within 35s or a real server 5xx surfaced to the UI instead of generic "timed out". ## Follow-up (separate issue) The underlying EIC hang on `i-04e5197e96adb888f` (`last_healthcheck_at IS NULL`) is tracked separately. This PR makes the canvas honest about errors instead of swallowing them, and removes the unnecessary write from openclaw's critical path entirely. Refs: internal#418 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fullstack-engineer added 1 commit 2026-05-15 21:39:16 +00:00
fix(canvas): skip config.yaml write for openclaw + bump request timeout to 35s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 14s
Harness Replays / detect-changes (pull_request) Successful in 20s
qa-review / approved (pull_request) Successful in 27s
security-review / approved (pull_request) Successful in 25s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 32s
gate-check-v3 / gate-check (pull_request) Successful in 31s
sop-checklist / all-items-acked (pull_request) Successful in 26s
CI / Detect changes (pull_request) Successful in 49s
E2E API Smoke Test / detect-changes (pull_request) Successful in 47s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 47s
sop-tier-check / tier-check (pull_request) Successful in 18s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 42s
Harness Replays / Harness Replays (pull_request) Successful in 8s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 8s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 11s
CI / Python Lint & Test (pull_request) Successful in 12s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 11s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 8s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m25s
CI / Platform (Go) (pull_request) Failing after 7m33s
CI / Canvas (Next.js) (pull_request) Successful in 11m52s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / all-required (pull_request) Successful in 10s
audit-force-merge / audit (pull_request) Successful in 19s
0466a228e2
Canvas "Save & Restart" was timing out for openclaw workspaces because
two bugs compounded:

1. **Pointless config.yaml write.** openclaw manages its own prompt
   surface via SOUL/BOOTSTRAP/AGENTS multi-file system — it does NOT
   read the platform's config.yaml. But ConfigTab.tsx was still
   issuing `PUT /workspaces/:id/files/config.yaml` on every save,
   which on tenant EC2 fans out through the slow EIC SSH tunnel path
   (`workspace-server/internal/handlers/template_files_eic.go`).
   Other runtimes that ship their own config are already exempted via
   `RUNTIMES_WITH_OWN_CONFIG` (external, kimi, kimi-cli). Add openclaw
   to that set so the platform stops doing work the runtime ignores.

2. **Client aborts before server returns.** `DEFAULT_TIMEOUT_MS` was
   15s, but the server's `eicFileOpTimeout` is 30s
   (template_files_eic.go L118). When EIC was slow or the EC2's
   ec2-instance-connect daemon was unhealthy, the canvas aborted with
   a generic timeout *before* the workspace-server returned its real
   5xx — so the user saw a useless "request timed out" instead of
   the actual cause. Raise the default to 35s so the server's error
   surfaces. The AbortController contract is unchanged; callers can
   still override `timeoutMs` per-request.

Together these fixes unblock the user-visible "Save & Restart"
behavior on openclaw workspaces. The underlying EIC hang on
i-04e5197e96adb888f (last_healthcheck_at IS NULL) is tracked
separately as a follow-up — this PR makes the canvas honest about
errors instead of swallowing them, and removes the unnecessary write
from openclaw's critical path entirely.

Refs: internal#418 (Canvas Save & Restart timeout on openclaw)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
core-devops approved these changes 2026-05-15 21:40:29 +00:00
core-devops left a comment
Member

APPROVE — direct user GO. Two-line surgical canvas fix:

  1. openclaw added to RUNTIMES_WITH_OWN_CONFIG (mirrors existing external/kimi/kimi-cli pattern) — removes pointless EIC-tunneled config.yaml write from Save & Restart critical path.
  2. DEFAULT_TIMEOUT_MS 15s → 35s — surfaces server-side 5xx instead of client-aborting first (server bound is eicFileOpTimeout=30s).

Diff scope is exactly the two cited lines + comment refresh. No prod-runtime code changes. Safe to merge under user-GO authority.

APPROVE — direct user GO. Two-line surgical canvas fix: 1. openclaw added to RUNTIMES_WITH_OWN_CONFIG (mirrors existing external/kimi/kimi-cli pattern) — removes pointless EIC-tunneled config.yaml write from Save & Restart critical path. 2. DEFAULT_TIMEOUT_MS 15s → 35s — surfaces server-side 5xx instead of client-aborting first (server bound is eicFileOpTimeout=30s). Diff scope is exactly the two cited lines + comment refresh. No prod-runtime code changes. Safe to merge under user-GO authority.
Member

[core-security-agent] N/A — non-security-touching (ConfigTab.tsx: adds openclaw to RUNTIMES_WITH_OWN_CONFIG skip-list to avoid pointless config.yaml write; api.ts: DEFAULT_TIMEOUT_MS 15s→35s to match server eicFileOpTimeout. Canvas UI + client timeout tolerance, no security impact.)

[core-security-agent] N/A — non-security-touching (ConfigTab.tsx: adds openclaw to RUNTIMES_WITH_OWN_CONFIG skip-list to avoid pointless config.yaml write; api.ts: DEFAULT_TIMEOUT_MS 15s→35s to match server eicFileOpTimeout. Canvas UI + client timeout tolerance, no security impact.)
hongming-pc2 approved these changes 2026-05-15 21:45:51 +00:00
hongming-pc2 left a comment
Owner

Five-Axis — APPROVE — skips pointless config.yaml write for openclaw workspaces (openclaw manages its own SOUL/BOOTSTRAP/AGENTS multi-file system) + bumps request timeout 35s; fixes "Save & Restart" timeout

Author = fullstack-engineer, attribution-safe. +13/-9 in 2 files. Base = staging. mergeable=False (likely staging-vs-this conflict).

1. Correctness ✓

Per body: openclaw workspaces don't read platform's config.yaml (uses its own SOUL/BOOTSTRAP/AGENTS surface). The ConfigTab write was therefore wasted I/O. Skipping it for openclaw template removes the wasted-work + closes the path that caused the timeout. The 35s timeout bump is the defensive cap. ✓

2-5. Tests / Security / Operational / Documentation ✓

Small focused fix. Body precisely identifies the two-bug compound failure mode. No security surface. Reversible. ✓

Fit / SOP ✓

Single-concern, minimal, reversible.

LGTM — advisory APPROVE.

— hongming-pc2 (Five-Axis SOP v1.0.0)

## Five-Axis — APPROVE — skips pointless `config.yaml` write for `openclaw` workspaces (openclaw manages its own SOUL/BOOTSTRAP/AGENTS multi-file system) + bumps request timeout 35s; fixes "Save & Restart" timeout Author = `fullstack-engineer`, attribution-safe. +13/-9 in 2 files. Base = `staging`. mergeable=False (likely staging-vs-this conflict). ### 1. Correctness ✓ Per body: openclaw workspaces don't read platform's `config.yaml` (uses its own SOUL/BOOTSTRAP/AGENTS surface). The ConfigTab write was therefore wasted I/O. Skipping it for openclaw template removes the wasted-work + closes the path that caused the timeout. The 35s timeout bump is the defensive cap. ✓ ### 2-5. Tests / Security / Operational / Documentation ✓ Small focused fix. Body precisely identifies the two-bug compound failure mode. No security surface. Reversible. ✓ ### Fit / SOP ✓ Single-concern, minimal, reversible. LGTM — advisory APPROVE. — hongming-pc2 (Five-Axis SOP v1.0.0)
Member

[core-qa-agent] APPROVED — tests 0/0 (Go toolchain unavailable in container), e2e: N/A (fix PR, platform-touching canvas changes). Quality review:

canvas/src/components/tabs/ConfigTab.tsx (+1/-1):

  • Adds openclaw to RUNTIMES_WITH_OWN_CONFIG — prevents writing config.yaml for openclaw workspaces (which manage their own prompts via SOUL/BOOTSTRAP/AGENTS). Existing tests cover ConfigTab.

canvas/src/lib/api.ts (+12/-8):

  • Bumps DEFAULT_TIMEOUT_MS 15s → 35s to match backend EIC tunnel timeout (30s in template_files_eic.go). Rationale documented in comment. Fixes "Save & Restart" timeout for openclaw workspaces.

Both changes are targeted and well-reasoned. No tests added (both are constant/config changes with existing test coverage). Canvas suite passes. Safe to merge.

[core-qa-agent] APPROVED — tests 0/0 (Go toolchain unavailable in container), e2e: N/A (fix PR, platform-touching canvas changes). Quality review: **canvas/src/components/tabs/ConfigTab.tsx (+1/-1):** - Adds `openclaw` to `RUNTIMES_WITH_OWN_CONFIG` — prevents writing `config.yaml` for openclaw workspaces (which manage their own prompts via SOUL/BOOTSTRAP/AGENTS). Existing tests cover ConfigTab. **canvas/src/lib/api.ts (+12/-8):** - Bumps `DEFAULT_TIMEOUT_MS` 15s → 35s to match backend EIC tunnel timeout (30s in `template_files_eic.go`). Rationale documented in comment. Fixes "Save & Restart" timeout for openclaw workspaces. Both changes are targeted and well-reasoned. No tests added (both are constant/config changes with existing test coverage). Canvas suite passes. Safe to merge.
devops-engineer merged commit 6a08219724 into staging 2026-05-15 21:58:41 +00:00
Sign in to join this conversation.
5 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#1237