Gitea Actions default runner timeout is ~15min. Add explicit
timeout-minutes: 30 to prevent false failures on slow/unprovisioned
runner instances. The content builds successfully in <5min locally.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The example showed terminationGracePeriodSeconds: 30, but the accompanying
note says the value "should exceed the healthcheck failure threshold
(3 × 30s = 90s)". With 30s < 90s, Kubernetes would send SIGTERM and
wait only 30s before SIGKILL — potentially killing the pod before the
graceful shutdown (3s via stop_event) completes.
Changed to 120s, which exceeds the 90s threshold and aligns the YAML
example with the documented requirement.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-15 04:57:26 +00:00
2 changed files with 41 additions and 43 deletions
| `MOLECULE_API_URL` | `http://localhost:8080` | Platform API URL. From Docker on Linux/macOS, use `http://host.docker.internal:8080` to reach the host machine. |
| `PORT` | `8000` | Agent server port (matches HEALTHCHECK) |
| `AGENT_CARD_URL` | `http://localhost:${PORT}/agent/card` | Advertised agent card URL (must be reachable from the platform) |
| `PLATFORM_URL` | `http://localhost:8080` | Platform API URL. Inside a Docker container, use `http://host.docker.internal:8080` to reach the platform on the host machine. |
| `WORKSPACE_ID` | — | Workspace ID from Step 1 (required; no default) |
| `PORT` | `8000` | Agent server port. Must match `containerPort` in Kubernetes and the port mapped with `-p` in Docker. |
The workspace agent supports graceful shutdown via a `stop_event: threading.Event`. When the container receives SIGTERM (e.g. from `docker stop`), the heartbeat loop exits cleanly with return value `"stopped"` instead of hanging.
When the container receives SIGTERM (e.g. from `docker stop` or Kubernetes pod deletion), the workspace's uvicorn server initiates graceful shutdown: the heartbeat loop stops, active A2A tasks are given a grace period to complete, and any snapshotable state is persisted before the process exits.
To enable SIGTERM handling in your agent code:
To integrate the heartbeat loop into custom agent code:
# SIGTERM is handled by the Docker runtime, which sends the signal to the
# workspace process. The workspace (via uvicorn) initiates graceful shutdown:
# the heartbeat loop is stopped, any active adapter tasks are cancelled, and
# in-flight A2A requests are given a grace period to complete.
#
# For custom integration with the heartbeat loop directly:
asyncdefmain():
heartbeat=HeartbeatLoop(
platform_url=os.environ["PLATFORM_URL"],
workspace_id=os.environ["WORKSPACE_ID"],
)
heartbeat.start()
try:
awaitasyncio.Event().wait()# keep running
finally:
awaitheartbeat.stop()
print("Heartbeat loop stopped.")
```
Without explicit SIGTERM handling, the container will be killed after the Docker default 10-second timeout. The healthcheck ensures orchestrators can detect an unhealthy container before the SIGTERM timeout.
The Docker `stop` command sends SIGTERM and waits up to 10 seconds by default before sending SIGKILL. The healthcheck ensures orchestrators detect an unhealthy container before the SIGTERM timeout.
## Kubernetes deployment
@@ -172,7 +169,7 @@ ports:
containerPort:8000
livenessProbe:
httpGet:
path:/agent/card
path:/.well-known/agent-card.json
port:http
initialDelaySeconds:30
periodSeconds:30
@@ -180,22 +177,22 @@ livenessProbe:
failureThreshold:3
readinessProbe:
httpGet:
path:/agent/card
path:/.well-known/agent-card.json
port:http
initialDelaySeconds:10
periodSeconds:10
timeoutSeconds:5
failureThreshold:3
terminationGracePeriodSeconds:30
terminationGracePeriodSeconds:120
```
> **Note:** The Kubernetes `terminationGracePeriodSeconds` should exceed the healthcheck failure threshold (3 × 30s = 90s) to allow the liveness probe to fail before the pod is killed.
> **Note:** The Kubernetes `terminationGracePeriodSeconds` should exceed the liveness probe failure threshold so that the probe can register a failure before the pod is killed. With `periodSeconds: 30` and `failureThreshold: 3`, the probe does not register a failure until approximately 120–150s after the container becomes unhealthy. Set `terminationGracePeriodSeconds: 120` or higher.
## Troubleshooting
| Symptom | Cause | Fix |
|---|---|---|
| Container shows `unhealthy` after startup | Platform unreachable from container | Verify `MOLECULE_API_URL` uses `host.docker.internal` (Docker) or the correct host IP |
| Container shows `unhealthy` after startup | Platform unreachable from container | Verify `PLATFORM_URL` uses `host.docker.internal` (Docker) or the correct host IP |
| `curl: (7) Failed to connect` on healthcheck | Container not fully started | Wait up to 30s; increase `start_period` |
| Agent not appearing on canvas | Wrong `WORKSPACE_ID` or expired token | Re-run registration; check platform logs |
| `host.docker.internal` not resolved | Linux host without the Docker flag | Use `--add-host=host.docker.internal:host-gateway` or the host's LAN IP |
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.