Compare commits

..

3 Commits

Author SHA1 Message Date
documentation-specialist 1496da1abf docs(changelog): add 2026-05-13 entry for stop_event + PLATFORM_URL fix
Secret scan / secret-scan (pull_request) Successful in 13s
CI / build (pull_request) Successful in 1m59s
Pairs molecule-sdk-python#8 (stop_event graceful shutdown) and
molecule-ai-workspace-runtime#12 (PLATFORM_URL default alignment).
Also notes molecule-core#773/776/777/781 internal CI hardening.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-13 04:33:35 +00:00
documentation-specialist 534d48005a Merge origin/main to sync with latest changelog entries 2026-05-13 04:33:21 +00:00
documentation-specialist cf20fcdfe7 docs(remote-workspaces): add stop_event for graceful shutdown + PLATFORM_URL fix
Secret scan / secret-scan (pull_request) Successful in 1m24s
CI / build (pull_request) Successful in 2m40s
Pair PRs molecule-sdk-python #8 and molecule-ai-workspace-runtime #12:
- Add stop_event parameter to run_heartbeat_loop example (60-second quick-start)
- Add full run_heartbeat_loop API reference section with stop_event, max_iterations, task_supplier
- Add run_agent_loop API reference section with handler, delivery, stop_event params
- Update PLATFORM_URL default in workspace-runtime.md env var example:
  http://platform:8080http://host.docker.internal:8080
  (aligns with the runtime fix that consolidated defaults across all modules)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-13 04:31:06 +00:00
4 changed files with 89 additions and 203 deletions
@@ -42,7 +42,7 @@ Common runtime environment variables:
```bash
WORKSPACE_ID=ws-123
WORKSPACE_CONFIG_PATH=/configs
PLATFORM_URL=http://platform:8080
PLATFORM_URL=http://host.docker.internal:8080
PARENT_ID=
AWARENESS_URL=http://awareness:37800
AWARENESS_NAMESPACE=workspace:ws-123
+17
View File
@@ -8,6 +8,23 @@ Entries are published daily at 23:50 UTC.
---
## 2026-05-13
### ✨ New features
- **Graceful shutdown support for remote agents**: `run_heartbeat_loop()` and `run_agent_loop()` in `molecule-sdk-python` now accept a `stop_event: threading.Event` parameter. Set the event from a SIGTERM handler to exit the loop cleanly with return value `"stopped"` — enabling proper graceful shutdown in Kubernetes, Docker, and other container-orchestrated environments. (`molecule-sdk-python` [#8](https://git.moleculesai.app/molecule-ai/molecule-sdk-python/pulls/8))
### 🔧 Fixes
- **PLATFORM_URL defaults aligned across all runtime modules**: all workspace runtime modules (`a2a_cli.py`, `a2a_client.py`, `a2a_mcp_server.py`, and 10 others) now consistently default `PLATFORM_URL` to `http://host.docker.internal:8080`. Previously some modules defaulted to `http://platform:8080`, causing connection failures in containerized deployments where the Docker host is not named `platform`. (`molecule-ai-workspace-runtime` [#12](https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-runtime/pulls/12))
### 🧹 Internal
- **Canvas CI hardening**: publish workflow updated to pipefail-safe shell probes; Gitea cache export no longer masks errors; canvas image published to ECR. (`molecule-core` [#773](https://git.moleculesai.app/molecule-ai/molecule-core/pulls/773), [#776](https://git.moleculesai.app/molecule-ai/molecule-core/pulls/776), [#777](https://git.moleculesai.app/molecule-ai/molecule-core/pulls/777))
- **Go lint CI hardening**: `golangci-lint run` no longer masked with `|| true`, so lint failures now fail the build loudly instead of being silently swallowed. (`molecule-core` [#781](https://git.moleculesai.app/molecule-ai/molecule-core/pulls/781))
---
## 2026-05-12
### 🔒 Security
+71 -1
View File
@@ -119,12 +119,20 @@ secrets = client.pull_secrets() # Phase 30.2 — decrypt API keys
print("Secrets:", list(secrets.keys()))
# Keep alive + respond to platform commands
import threading, signal, sys
stop_event = threading.Event()
signal.signal(signal.SIGTERM, lambda *_: stop_event.set())
client.run_heartbeat_loop(
task_supplier = lambda: {
"current_task": "idle",
"active_tasks": 0,
}
},
stop_event = stop_event,
)
# → exits with "stopped" on SIGTERM, "paused" if platform pauses us,
# "removed" if the workspace is deleted, or loops forever if neither.
EOF
```
@@ -192,6 +200,68 @@ Each inbound message carries these fields in addition to the standard A2A fields
> **Note:** `peer_name`, `peer_role`, and `agent_card_url` are enriched from the platform's peer registry at dispatch time. They are `None` if the sending peer has not registered an agent card.
### run_heartbeat_loop(stop_event=, max_iterations=, task_supplier=)
Drives heartbeat + state-poll on a timer. Returns the terminal status when the loop exits.
```python
import threading, signal
stop_event = threading.Event()
signal.signal(signal.SIGTERM, lambda *_: stop_event.set())
status = client.run_heartbeat_loop(
max_iterations = None, # None = run until paused/deleted; int = stop after N ticks
task_supplier = lambda: { # optional — report current task to the canvas
"current_task": "idle",
"active_tasks": 0,
},
stop_event = stop_event, # set() to exit cleanly with return value "stopped"
)
# status is one of: "stopped" | "paused" | "removed" | "max_iterations"
```
| Parameter | Type | Description |
|---|---|---|
| `stop_event` | `threading.Event \| None` | When set, the loop exits cleanly with `"stopped"`. Use in a SIGTERM handler for graceful Kubernetes/Docker shutdown. Ignored when `None`. |
| `max_iterations` | `int \| None` | Stop after N loop iterations. `None` (default) = run until the workspace is paused or deleted. |
| `task_supplier` | `callable \| None` | Zero-arg callable returning `{"current_task": str, "active_tasks": int}`. Reports activity to the canvas on each tick. |
Errors from the heartbeat or state poll are logged and the loop continues — a transient platform hiccup does not take the agent offline.
### run_agent_loop(handler, delivery=, stop_event=, max_iterations=, task_supplier=)
Combined heartbeat + state-poll + inbound-delivery loop. The recommended entry point for external agent authors: registers, heartbeats, state-polls, and dispatches inbound A2A messages in one synchronous call.
```python
from molecule_agent import RemoteAgentClient, PollDelivery
import threading, signal
stop_event = threading.Event()
signal.signal(signal.SIGTERM, lambda *_: stop_event.set())
async def handle(msg):
print(f"Got message: {msg.method}")
return "Acknowledged"
status = client.run_agent_loop(
handler = handle,
delivery = None, # defaults to PollDelivery — correct for agents without a public URL
stop_event = stop_event, # set() to exit cleanly
max_iterations = None,
task_supplier = lambda: {"current_task": "idle", "active_tasks": 0},
)
# status is one of: "stopped" | "paused" | "removed" | "max_iterations"
```
| Parameter | Type | Description |
|---|---|---|
| `handler` | `Callable[[InboundMessage], str \| None]` | Called once per inbound A2A message. Return a non-empty string to auto-reply; `None` to skip the reply. |
| `delivery` | `InboundDelivery \| None` | Delivery mechanism. Defaults to `PollDelivery` (polling, no inbound URL needed). Pass `PushDelivery` wrapped around an `A2AServer` for push-mode agents. |
| `stop_event` | `threading.Event \| None` | When set, the loop exits cleanly with `"stopped"`. Ignored when `None`. |
| `max_iterations` | `int \| None` | Stop after N loop iterations. `None` = run until paused/deleted. |
| `task_supplier` | `callable \| None` | Zero-arg callable returning `{"current_task": str, "active_tasks": int}`. |
### Security: OFFSEC-003 — trust-boundary markers on peer responses
When a remote workspace receives a `delegate_task` response from an external peer, the platform wraps the peer-generated content in `[A2A_RESULT_FROM_PEER]...[/A2A_RESULT_FROM_PEER]` trust-boundary markers. These markers signal to the agent that the enclosed content originated outside the platform's trust boundary and must not be re-injected as platform-native output.
@@ -1,201 +0,0 @@
---
title: Self-Hosted Workspace Deployment with Docker
---
# Self-Hosted Workspace Deployment with Docker
This guide covers running a Molecule AI workspace agent as a Docker container on a self-hosted server or VM. It covers the Docker image, required environment variables, the built-in healthcheck, graceful shutdown, and Kubernetes deployment considerations.
> **Prerequisites:** A running Molecule AI control plane (self-hosted or SaaS), an `ADMIN_TOKEN` or org-scoped API key with admin scope, and Docker 20.10+ on the host.
## How the workspace container works
The Molecule AI workspace Dockerfile includes:
- A `HEALTHCHECK` directive that probes the agent card endpoint every 30 seconds
- A uvicorn server on port 8000 (configurable via `PORT`)
- Support for `stop_event` graceful shutdown via SIGTERM
```
┌─────────────────────────────────────────────┐
│ Docker host (your VM / bare metal) │
│ │
│ ┌─────────────────────────────────────┐ │
│ │ workspace container │ │
│ │ │ │
│ │ uvicorn (port 8000) │ │
│ │ └─ /agent/card ← HEALTHCHECK │ │
│ │ │ │
│ │ run_heartbeat_loop(stop_event) │ │
│ └──────────────┬──────────────────────┘ │
│ │ │
│ host.docker.internal:8080 │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────┐ │
│ │ Molecule AI control plane │ │
│ │ (platform on port 8080) │ │
│ └─────────────────────────────────────┘ │
└─────────────────────────────────────────────┘
```
## Step 1: Create an external workspace
First register the workspace as an external (self-managed) agent on the platform.
```bash
ADMIN_TOKEN="your-admin-token"
PLATFORM_URL="https://platform.moleculesai.app" # or http://localhost:8080 for local dev
WORKSPACE=$(curl -s -X POST "${PLATFORM_URL}/workspaces" \
-H "Authorization: Bearer ${ADMIN_TOKEN}" \
-H "Content-Type: application/json" \
-d '{"name": "self-hosted-agent", "runtime": "external"}')
WORKSPACE_ID=$(echo "$WORKSPACE" | python3 -c "import json,sys; print(json.load(sys.stdin)['id'])")
echo "Workspace ID: $WORKSPACE_ID"
```
Save the returned `WORKSPACE_ID` and bearer token from the next step.
## Step 2: Pull the workspace image
The workspace image is published to the Molecule AI ECR registry. Contact your platform administrator for the registry prefix and credentials, then log in:
```bash
aws ecr get-login-password --region us-east-1 | \
docker login --username AWS --password-stdin "${REGISTRY_PREFIX}.dkr.ecr.us-east-1.amazonaws.com"
docker pull "${REGISTRY_PREFIX}.dkr.ecr.us-east-1.amazonaws.com/molecule-workspace:latest"
```
## Step 3: Configure environment variables
| Variable | Default | Description |
|---|---|---|
| `MOLECULE_API_URL` | `http://localhost:8080` | Platform API URL. From Docker on Linux/macOS, use `http://host.docker.internal:8080` to reach the host machine. |
| `MOLECULE_API_KEY` | — | Bearer token obtained during agent registration |
| `WORKSPACE_ID` | — | Workspace ID from Step 1 |
| `PORT` | `8000` | Agent server port (matches HEALTHCHECK) |
| `AGENT_CARD_URL` | `http://localhost:${PORT}/agent/card` | Advertised agent card URL (must be reachable from the platform) |
## Step 4: Run the container
### Docker (standalone)
```bash
docker run -d \
--name molecule-workspace \
-p 8000:8000 \
-e MOLECULE_API_URL="http://host.docker.internal:8080" \
-e MOLECULE_API_KEY="your-agent-bearer-token" \
-e WORKSPACE_ID="your-workspace-id" \
-e PORT=8000 \
"${REGISTRY_PREFIX}.dkr.ecr.us-east-1.amazonaws.com/molecule-workspace:latest"
```
> **Note for Linux hosts:** Docker does not include `host.docker.internal` by default. On Linux, either add `--add-host=host.docker.internal:host-gateway` to the `docker run` command, or use the host machine's IP address directly (e.g. `http://192.168.1.100:8080`).
### Verify the healthcheck
```bash
# Wait for the container to become healthy (up to ~2 minutes)
docker inspect --format='{{.State.Health.Status}}' molecule-workspace
# Expected output: healthy
# Once healthy, the agent card is reachable:
curl -s http://localhost:8000/agent/card | python3 -m json.tool
```
### Docker Compose
```yaml
services:
molecule-workspace:
image: "${REGISTRY_PREFIX}.dkr.ecr.us-east-1.amazonaws.com/molecule-workspace:latest"
ports:
- "8000:8000"
environment:
MOLECULE_API_URL: "http://host.docker.internal:8080"
MOLECULE_API_KEY: "your-agent-bearer-token"
WORKSPACE_ID: "your-workspace-id"
PORT: "8000"
# Linux hosts: add host.docker.internal resolution
# extra_hosts:
# - "host.docker.internal:host-gateway"
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/agent/card"]
interval: 30s
timeout: 5s
retries: 3
start_period: 30s
```
## Step 5: Graceful shutdown
The workspace agent supports graceful shutdown via a `stop_event: threading.Event`. When the container receives SIGTERM (e.g. from `docker stop`), the heartbeat loop exits cleanly with return value `"stopped"` instead of hanging.
To enable SIGTERM handling in your agent code:
```python
import signal, threading
from molecule_agent import RemoteAgentClient
client = RemoteAgentClient(
molecule_api_url=os.environ["MOLECULE_API_URL"],
api_key=os.environ["MOLECULE_API_KEY"],
workspace_id=os.environ["WORKSPACE_ID"],
)
stop_event = threading.Event()
def sigterm_handler(signum, frame):
print("Received SIGTERM, initiating graceful shutdown...")
stop_event.set()
signal.signal(signal.SIGTERM, sigterm_handler)
# run_heartbeat_loop exits with return value "stopped" when stop_event is set
result = client.run_heartbeat_loop(stop_event=stop_event)
print(f"Heartbeat loop stopped: {result}")
```
Without explicit SIGTERM handling, the container will be killed after the Docker default 10-second timeout. The healthcheck ensures orchestrators can detect an unhealthy container before the SIGTERM timeout.
## Kubernetes deployment
For Kubernetes deployments, use the native liveness/readiness probe configuration instead of the Docker HEALTHCHECK:
```yaml
ports:
- name: http
containerPort: 8000
livenessProbe:
httpGet:
path: /agent/card
port: http
initialDelaySeconds: 30
periodSeconds: 30
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /agent/card
port: http
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
terminationGracePeriodSeconds: 120
```
> **Note:** `terminationGracePeriodSeconds` must exceed the liveness probe failure window (3 × 30s = 90s) so that Kubernetes sends SIGTERM and allows graceful shutdown before the pod is killed. The 120s value here gives a 30s buffer beyond the 90s threshold.
## Troubleshooting
| Symptom | Cause | Fix |
|---|---|---|
| Container shows `unhealthy` after startup | Platform unreachable from container | Verify `MOLECULE_API_URL` uses `host.docker.internal` (Docker) or the correct host IP |
| `curl: (7) Failed to connect` on healthcheck | Container not fully started | Wait up to 30s; increase `start_period` |
| Agent not appearing on canvas | Wrong `WORKSPACE_ID` or expired token | Re-run registration; check platform logs |
| `host.docker.internal` not resolved | Linux host without the Docker flag | Use `--add-host=host.docker.internal:host-gateway` or the host's LAN IP |