Compare commits
9 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| a3fa07f476 | |||
| cd323240cb | |||
| 13ca8a0b81 | |||
| e1455eafc4 | |||
| 90df616fa4 | |||
| f96235f32a | |||
| e7a23338bf | |||
| 7c1ac608d3 | |||
| 4e40da7fc2 |
@@ -6,12 +6,7 @@ on:
|
||||
branches: [main]
|
||||
jobs:
|
||||
build:
|
||||
# Self-hosted Mac mini — this repo is private and the org's
|
||||
# GitHub-hosted minute budget is exhausted (every ubuntu-latest job
|
||||
# dies in 2s with no step output). Per the 2026-04-22 carve-out:
|
||||
# private repos run on self-hosted; public repos use ubuntu-latest
|
||||
# (still free).
|
||||
runs-on: self-hosted
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/setup-node@v4
|
||||
|
||||
+40
-40
@@ -11,66 +11,66 @@ Entries are published daily at 23:50 UTC.
|
||||
|
||||
### ✨ New features
|
||||
|
||||
- **SaaS Federation v2 tutorial**: a clean, self-contained walkthrough for platform operators who want to run multi-tenant workspaces from a single control plane. Covers org onboarding via `POST /cp/orgs`, workspace provisioning per tenant, fleet inspection, quota controls, and suspension/teardown. (`molecule-core` [#1700](https://github.com/Molecule-AI/molecule-core/pull/1700))
|
||||
- **External workspace quickstart**: a 5-minute guide to running any HTTP-speaking agent (Python, Node, Go, Rust) on your own machine and having it appear on the canvas alongside platform-provisioned agents. Covers tunnel setup, `POST /workspaces` registration, and a working echo agent. (`molecule-core` [#1760](https://github.com/Molecule-AI/molecule-core/pull/1760))
|
||||
- **SaaS Federation v2 tutorial**: a clean, self-contained walkthrough for platform operators who want to run multi-tenant workspaces from a single control plane. Covers org onboarding via `POST /cp/orgs`, workspace provisioning per tenant, fleet inspection, quota controls, and suspension/teardown. (`molecule-core` [#1700](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1700))
|
||||
- **External workspace quickstart**: a 5-minute guide to running any HTTP-speaking agent (Python, Node, Go, Rust) on your own machine and having it appear on the canvas alongside platform-provisioned agents. Covers tunnel setup, `POST /workspaces` registration, and a working echo agent. (`molecule-core` [#1760](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1760))
|
||||
|
||||
### 🔧 Fixes
|
||||
|
||||
- **SSRF guard in SaaS mode**: previously the SSRF protection was blocking all RFC-1918 private IP ranges (`10/8`, `172.16/12`, `192.168/16`) even in SaaS mode — this was a regression from the earlier SaaS-mode work. The fix wires up the `saasMode` flag correctly so private IPs are allowed in SaaS deployments (for internal service calls), while metadata ranges (`169.254/16`), CGNAT, loopback, and link-local remain blocked in every mode. IPv6 ULA (`fd00::/8`) handling is also now correct. (`molecule-core` [#1692](https://github.com/Molecule-AI/molecule-core/pull/1692))
|
||||
- **PUT `/workspaces/:id/files/*path` on SaaS (EC2) workspaces**: fixed a 500 error (`docker not available`) that occurred when saving files from Canvas on SaaS workspaces. The handler now detects non-Docker workspaces via `workspaces.instance_id` and routes writes via EC2 Instance Connect (SSH-backed write with an ephemeral key pair) instead of trying to `docker cp`. (`molecule-core` [#1702](https://github.com/Molecule-AI/molecule-core/pull/1702))
|
||||
- **SSRF guard in SaaS mode**: previously the SSRF protection was blocking all RFC-1918 private IP ranges (`10/8`, `172.16/12`, `192.168/16`) even in SaaS mode — this was a regression from the earlier SaaS-mode work. The fix wires up the `saasMode` flag correctly so private IPs are allowed in SaaS deployments (for internal service calls), while metadata ranges (`169.254/16`), CGNAT, loopback, and link-local remain blocked in every mode. IPv6 ULA (`fd00::/8`) handling is also now correct. (`molecule-core` [#1692](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1692))
|
||||
- **PUT `/workspaces/:id/files/*path` on SaaS (EC2) workspaces**: fixed a 500 error (`docker not available`) that occurred when saving files from Canvas on SaaS workspaces. The handler now detects non-Docker workspaces via `workspaces.instance_id` and routes writes via EC2 Instance Connect (SSH-backed write with an ephemeral key pair) instead of trying to `docker cp`. (`molecule-core` [#1702](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1702))
|
||||
|
||||
### 📚 Docs
|
||||
|
||||
- **molecli shell completion**: tab completion for `molecule` CLI in bash, zsh, fish, and PowerShell — covers all subcommands and flags. (`docs` [#79](https://github.com/Molecule-AI/docs/pull/79))
|
||||
- **MCP server structured logging**: `LOG_LEVEL` env var, pino JSON output with AsyncLocalStorage context on every tool call. (`docs` [#78](https://github.com/Molecule-AI/docs/pull/78))
|
||||
- **molecli shell completion**: tab completion for `molecule` CLI in bash, zsh, fish, and PowerShell — covers all subcommands and flags. (`docs` [#79](https://git.moleculesai.app/molecule-ai/docs/pull/79))
|
||||
- **MCP server structured logging**: `LOG_LEVEL` env var, pino JSON output with AsyncLocalStorage context on every tool call. (`docs` [#78](https://git.moleculesai.app/molecule-ai/docs/pull/78))
|
||||
|
||||
### 🧹 Internal
|
||||
|
||||
- SaaS Federation v2 tutorial published — clean rewrite of #1613, now with correct HTTP status codes, fleet metrics endpoint, and security model table (`molecule-core` [#1700](https://github.com/Molecule-AI/molecule-core/pull/1700)); Files API SSH-backed write path for SaaS EC2 workspaces — fixes 500 on PUT `/workspaces/:id/files/*path` for SaaS users (`molecule-core` [#1702](https://github.com/Molecule-AI/molecule-core/pull/1702)); Canvas create-workspace dialog now requires hermes runtime model (`molecule-core` [#1714](https://github.com/Molecule-AI/molecule-core/pull/1714)).
|
||||
- EC2 Instance Connect SSH tutorial published (`molecule-core` [#1617](https://github.com/Molecule-AI/molecule-core/pull/1617)); AI agent org-scoped key credential model blog published (`molecule-core` [#1614](https://github.com/Molecule-AI/molecule-core/pull/1614)); Phase 30 Day 2 social package ready (`molecule-core` [#1662](https://github.com/Molecule-AI/molecule-core/pull/1662)).
|
||||
- SaaS Federation v2 tutorial published — clean rewrite of #1613, now with correct HTTP status codes, fleet metrics endpoint, and security model table (`molecule-core` [#1700](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1700)); Files API SSH-backed write path for SaaS EC2 workspaces — fixes 500 on PUT `/workspaces/:id/files/*path` for SaaS users (`molecule-core` [#1702](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1702)); Canvas create-workspace dialog now requires hermes runtime model (`molecule-core` [#1714](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1714)).
|
||||
- EC2 Instance Connect SSH tutorial published (`molecule-core` [#1617](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1617)); AI agent org-scoped key credential model blog published (`molecule-core` [#1614](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1614)); Phase 30 Day 2 social package ready (`molecule-core` [#1662](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1662)).
|
||||
|
||||
### 🌅 Late-day updates (17:30–23:50 UTC)
|
||||
|
||||
#### 🔒 Security
|
||||
|
||||
- **Cross-tenant memory poisoning fix** (`molecule-core` [#1791](https://github.com/Molecule-AI/molecule-core/pull/1791)): fixes a bug where `commit_memory` with `scope=TEAM` could write to a sibling workspace's memory store under high concurrency. `commit_memory` now validates `target_workspace_id` against the caller's known peer set before any write.
|
||||
- **CWE-78 shell injection hardening** (`molecule-core` [#1885](https://github.com/Molecule-AI/molecule-core/pull/1885)): `shellQuote` now uses `strconv.Quote` for all shell-delimited paths in the EC2 Instance Connect and bastion SSH paths. Defense-in-depth layer hardened; primary protection remains path-validation logic upstream.
|
||||
- **Cross-tenant memory poisoning fix** (`molecule-core` [#1791](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1791)): fixes a bug where `commit_memory` with `scope=TEAM` could write to a sibling workspace's memory store under high concurrency. `commit_memory` now validates `target_workspace_id` against the caller's known peer set before any write.
|
||||
- **CWE-78 shell injection hardening** (`molecule-core` [#1885](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1885)): `shellQuote` now uses `strconv.Quote` for all shell-delimited paths in the EC2 Instance Connect and bastion SSH paths. Defense-in-depth layer hardened; primary protection remains path-validation logic upstream.
|
||||
|
||||
#### ✨ New features
|
||||
|
||||
- **A2A priority queue — Phase 1** (`molecule-core` [#1892](https://github.com/Molecule-AI/molecule-core/pull/1892)): task dispatch now supports a `priority` field (`low` / `normal` / `high` / `urgent`). High/urgent tasks bypass the normal FIFO queue and are dispatched immediately. Phase 2 (priority inversion deadlock prevention) on the roadmap.
|
||||
- **A2A priority queue — Phase 1** (`molecule-core` [#1892](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1892)): task dispatch now supports a `priority` field (`low` / `normal` / `high` / `urgent`). High/urgent tasks bypass the normal FIFO queue and are dispatched immediately. Phase 2 (priority inversion deadlock prevention) on the roadmap.
|
||||
|
||||
#### 🔧 Fixes
|
||||
|
||||
- **A2A queue nil-safe drain** (`molecule-core` [#1893](https://github.com/Molecule-AI/molecule-core/pull/1893), [#1896](https://github.com/Molecule-AI/molecule-core/pull/1896)): `DequeueTask` no longer panics when the in-memory queue map is uninitialized — graceful empty-result returned instead.
|
||||
- **Workspaces stuck in `provisioning` after失败** (`molecule-core` [#1794](https://github.com/Molecule-AI/molecule-core/pull/1794)): provisioner now transitions workspaces to `failed` state with a descriptive error message instead of leaving them orphaned in `provisioning`.
|
||||
- **Dedup settings hooks double-fire** (`molecule-core` [#1797](https://github.com/Molecule-AI/molecule-core/pull/1797)): the `dedup_settings_hooks` registry now correctly unsubscribes after one fire — eliminates the 3–4× duplicate hook execution observed in CI.
|
||||
- **Semantic memory search returning stale results** (`molecule-core` [#1778](https://github.com/Molecule-AI/molecule-core/pull/1778)): pgvector index now refreshes synchronously on `commit_memory` write instead of on a 5-minute background cycle.
|
||||
- **pgvector migration race in E2E CI** (`molecule-core` [#1777](https://github.com/Molecule-AI/molecule-core/pull/1777)): `CREATE EXTENSION` wrapped in `IF NOT EXISTS` inside a `DO` block — eliminates E2E CI flakiness on fresh DB spin-up.
|
||||
- **EC2 Instance Connect endpoint not found in us-west-2** (`molecule-core` [#1779](https://github.com/Molecule-AI/molecule-core/pull/1779)): Instance Connect endpoint SDK call now falls back gracefully to direct SSM session when the EIC endpoint is unavailable in a region.
|
||||
- **Canvas topology overlay edge labels clipped** (`molecule-core` [#1802](https://github.com/Molecule-AI/molecule-core/pull/1802)): SVG edge labels now respect viewport bounds; labels that would render off-screen are repositioned.
|
||||
- **Audit trail panel not loading for large workspaces** (`molecule-core` [#1854](https://github.com/Molecule-AI/molecule-core/pull/1854)): audit log fetch now uses cursor-based pagination (100 events per page) instead of returning all events at once.
|
||||
- **Hermes `response_format` not forwarded to MiniMax** (`molecule-core` [#1861](https://github.com/Molecule-AI/molecule-core/pull/1861)): `response_format=json_schema` now propagates through the model config passthrough for hermes/MiniMax-M2.7-highspeed workspaces.
|
||||
- **Memory Inspector panel memory leak** (`molecule-core` [#1871](https://github.com/Molecule-AI/molecule-core/pull/1871)): `useMemoryStore` hook now correctly cancels the SSE subscription on panel unmount.
|
||||
- **Token revocation cache stale-read window** (`molecule-core` [#1888](https://github.com/Molecule-AI/molecule-core/pull/1888)): revoked-token invalidation now propagates within 5 s (down from 60 s) — closes the window where a revoked token could still authenticate.
|
||||
- **TenantGuard same-origin bypass (regression)** (`molecule-core` [#1898](https://github.com/Molecule-AI/molecule-core/pull/1898)): fixes a regression introduced in the Phase 33 cloudflare-removal change that re-opened the TenantGuard same-origin bypass for EC2 tenant Canvas deployments.
|
||||
- **A2A queue nil-safe drain** (`molecule-core` [#1893](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1893), [#1896](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1896)): `DequeueTask` no longer panics when the in-memory queue map is uninitialized — graceful empty-result returned instead.
|
||||
- **Workspaces stuck in `provisioning` after失败** (`molecule-core` [#1794](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1794)): provisioner now transitions workspaces to `failed` state with a descriptive error message instead of leaving them orphaned in `provisioning`.
|
||||
- **Dedup settings hooks double-fire** (`molecule-core` [#1797](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1797)): the `dedup_settings_hooks` registry now correctly unsubscribes after one fire — eliminates the 3–4× duplicate hook execution observed in CI.
|
||||
- **Semantic memory search returning stale results** (`molecule-core` [#1778](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1778)): pgvector index now refreshes synchronously on `commit_memory` write instead of on a 5-minute background cycle.
|
||||
- **pgvector migration race in E2E CI** (`molecule-core` [#1777](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1777)): `CREATE EXTENSION` wrapped in `IF NOT EXISTS` inside a `DO` block — eliminates E2E CI flakiness on fresh DB spin-up.
|
||||
- **EC2 Instance Connect endpoint not found in us-west-2** (`molecule-core` [#1779](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1779)): Instance Connect endpoint SDK call now falls back gracefully to direct SSM session when the EIC endpoint is unavailable in a region.
|
||||
- **Canvas topology overlay edge labels clipped** (`molecule-core` [#1802](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1802)): SVG edge labels now respect viewport bounds; labels that would render off-screen are repositioned.
|
||||
- **Audit trail panel not loading for large workspaces** (`molecule-core` [#1854](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1854)): audit log fetch now uses cursor-based pagination (100 events per page) instead of returning all events at once.
|
||||
- **Hermes `response_format` not forwarded to MiniMax** (`molecule-core` [#1861](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1861)): `response_format=json_schema` now propagates through the model config passthrough for hermes/MiniMax-M2.7-highspeed workspaces.
|
||||
- **Memory Inspector panel memory leak** (`molecule-core` [#1871](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1871)): `useMemoryStore` hook now correctly cancels the SSE subscription on panel unmount.
|
||||
- **Token revocation cache stale-read window** (`molecule-core` [#1888](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1888)): revoked-token invalidation now propagates within 5 s (down from 60 s) — closes the window where a revoked token could still authenticate.
|
||||
- **TenantGuard same-origin bypass (regression)** (`molecule-core` [#1898](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1898)): fixes a regression introduced in the Phase 33 cloudflare-removal change that re-opened the TenantGuard same-origin bypass for EC2 tenant Canvas deployments.
|
||||
|
||||
#### 📚 Docs
|
||||
|
||||
- **Chrome DevTools MCP tutorial** (`docs` [#1798](https://github.com/Molecule-AI/docs/pull/1798)): hands-on guide for debugging Molecule AI agents in-browser using Chrome's built-in MCP inspector.
|
||||
- **Phase 34 launch page** (`docs` [#1799](https://github.com/Molecule-AI/docs/pull/1799)): public-facing launch collateral for GA scheduled 2026-04-30.
|
||||
- **Tool Trace demo environment** (`docs` [#1844](https://github.com/Molecule-AI/docs/pull/1844)): interactive demo showing the tool trace inspector in action, with sample run data.
|
||||
- **Enterprise battlecard** (`docs` [#1864](https://github.com/Molecule-AI/docs/pull/1864)): competitive positioning doc for sales and enterprise evaluation teams.
|
||||
- **Chrome DevTools MCP tutorial** (`docs` [#1798](https://git.moleculesai.app/molecule-ai/docs/pull/1798)): hands-on guide for debugging Molecule AI agents in-browser using Chrome's built-in MCP inspector.
|
||||
- **Phase 34 launch page** (`docs` [#1799](https://git.moleculesai.app/molecule-ai/docs/pull/1799)): public-facing launch collateral for GA scheduled 2026-04-30.
|
||||
- **Tool Trace demo environment** (`docs` [#1844](https://git.moleculesai.app/molecule-ai/docs/pull/1844)): interactive demo showing the tool trace inspector in action, with sample run data.
|
||||
- **Enterprise battlecard** (`docs` [#1864](https://git.moleculesai.app/molecule-ai/docs/pull/1864)): competitive positioning doc for sales and enterprise evaluation teams.
|
||||
|
||||
#### 🧹 Internal
|
||||
|
||||
- `a2a-sdk` hot-pinned to `0.3.x` across all workspace template repos (`molecule-core` [#1890](https://github.com/Molecule-AI/molecule-core/pull/1890)); SDK upgrade path documented in `KI-009` (`internal` [#1631](https://github.com/Molecule-AI/internal/issues/1631)).
|
||||
- `a2a-sdk` hot-pinned to `0.3.x` across all workspace template repos (`molecule-core` [#1890](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1890)); SDK upgrade path documented in `KI-009` (`internal` [#1631](https://git.moleculesai.app/molecule-ai/internal/issues/1631)).
|
||||
- Phase 34 CI matrix expanded to cover Node 22 and Go 1.24 (`molecule-ci`).
|
||||
|
||||
#### 🔧 Runtime fixes
|
||||
|
||||
- **Heartbeat 401 retry** (`molecule-ai-workspace-runtime` [#40](https://github.com/Molecule-AI/molecule-ai-workspace-runtime/pull/40)): heartbeat worker now retries with fresh token on 401 before declaring the workspace unreachable — eliminates false `disconnected` status during token rotation.
|
||||
- **LLM token auto-detect** (`molecule-ai-workspace-runtime` [#38](https://github.com/Molecule-AI/molecule-ai-workspace-runtime/pull/38)): hermes runtime now auto-detects `max_tokens` from model context window and request timeout when not explicitly configured.
|
||||
- **Heartbeat 401 retry** (`molecule-ai-workspace-runtime` [#40](https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-runtime/pull/40)): heartbeat worker now retries with fresh token on 401 before declaring the workspace unreachable — eliminates false `disconnected` status during token rotation.
|
||||
- **LLM token auto-detect** (`molecule-ai-workspace-runtime` [#38](https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-runtime/pull/38)): hermes runtime now auto-detects `max_tokens` from model context window and request timeout when not explicitly configured.
|
||||
|
||||
---
|
||||
|
||||
@@ -84,7 +84,7 @@ Customer selects `model=minimax/MiniMax-M2.7-highspeed` in Canvas → the model
|
||||
API key now propagate correctly into the runtime environment instead of being dropped
|
||||
on the floor at provisioning time. Works for hermes workspaces in both hosted SaaS
|
||||
and self-hosted EC2 deployments.
|
||||
(`molecule-core` [#1685](https://github.com/Molecule-AI/molecule-core/pull/1685))
|
||||
(`molecule-core` [#1685](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1685))
|
||||
|
||||
#### EC2 Instance Connect Endpoint — one-click shell from Canvas
|
||||
Canvas Terminal tab now uses AWS EC2 Instance Connect Endpoint to open a PTY inside
|
||||
@@ -92,7 +92,7 @@ any workspace EC2 instance — no SSH keys to manage, no IP to copy, no security
|
||||
rules to configure. IAM policy gates access, STS pushes a short-lived key that
|
||||
auto-expires, and every tunnel open is recorded in CloudTrail.
|
||||
See the [EC2 Instance Connect guide](/docs/infra/workspace-terminal).
|
||||
(`molecule-core` [#1554](https://github.com/Molecule-AI/molecule-core/pull/1554))
|
||||
(`molecule-core` [#1554](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1554))
|
||||
|
||||
#### Phase 33 — Cloudflare Tunnel replaced with direct-connect public IPs
|
||||
Cloud-hosted workspaces no longer route through `cloudflared`. Each workspace gets
|
||||
@@ -101,32 +101,32 @@ TLS on port 443. Reduces latency by ~20–40 ms (region-dependent), removes the
|
||||
Cloudflare egress cost dependency, and enables direct `curl` debugging without
|
||||
the tunnel path.
|
||||
See the [migration blog post](/blog/cloudflare-tunnel-migration).
|
||||
(`molecule-core` [#1612](https://github.com/Molecule-AI/molecule-core/pull/1612))
|
||||
(`molecule-core` [#1612](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1612))
|
||||
|
||||
### 🔒 Security
|
||||
|
||||
- **F1085 deleteViaEphemeral**: `rm` scope restricted to `/configs` volume only —
|
||||
prevents deletion of application code or workspace files if the exec form is
|
||||
exploited. Applied to both `main` and `staging`. (`molecule-core` [#1682](https://github.com/Molecule-AI/molecule-core/pull/1682), [#1616](https://github.com/Molecule-AI/molecule-core/pull/1616))
|
||||
exploited. Applied to both `main` and `staging`. (`molecule-core` [#1682](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1682), [#1616](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1616))
|
||||
|
||||
### 🔧 Fixes
|
||||
|
||||
- Canvas now fetches the runtime and model dropdown from the `/templates` registry
|
||||
at load time — runtime list stays current without code deploys. (`molecule-core` [#1666](https://github.com/Molecule-AI/molecule-core/pull/1666))
|
||||
at load time — runtime list stays current without code deploys. (`molecule-core` [#1666](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1666))
|
||||
- Canvas accessibility: `aria-hidden` correctly applied to decorative SVGs;
|
||||
`MissingKeysModal` now uses correct dialog semantics and manages focus. (`molecule-core` [#1594](https://github.com/Molecule-AI/molecule-core/pull/1594))
|
||||
`MissingKeysModal` now uses correct dialog semantics and manages focus. (`molecule-core` [#1594](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1594))
|
||||
- Provisioner pulls workspace template images from GHCR instead of Docker Hub
|
||||
for faster cold starts and reduced third-party dependency. (`molecule-core` [#1624](https://github.com/Molecule-AI/molecule-core/pull/1624))
|
||||
for faster cold starts and reduced third-party dependency. (`molecule-core` [#1624](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1624))
|
||||
- Shared runtime heartbeat no longer leaves workspaces in a phantom-busy state after
|
||||
task completion. (`molecule-ai-workspace-runtime` [#37](https://github.com/Molecule-AI/molecule-ai-workspace-runtime/pull/37))
|
||||
task completion. (`molecule-ai-workspace-runtime` [#37](https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-runtime/pull/37))
|
||||
|
||||
### 📚 Docs
|
||||
|
||||
- **MCP server structured logging**: `LOG_LEVEL` env var (`trace`/`debug`/`info`/`warn`/`error`/`fatal`),
|
||||
pino JSON output in production, pretty-print in development, AsyncLocalStorage
|
||||
context on every log entry (tool name, request ID, workspace ID). (`docs` [#78](https://github.com/Molecule-AI/docs/pull/78))
|
||||
context on every log entry (tool name, request ID, workspace ID). (`docs` [#78](https://git.moleculesai.app/molecule-ai/docs/pull/78))
|
||||
- **molecli shell completion**: tab completion for `molecule` CLI in bash, zsh, fish,
|
||||
and PowerShell — covers all subcommands and flags. (`docs` [#79](https://github.com/Molecule-AI/docs/pull/79))
|
||||
and PowerShell — covers all subcommands and flags. (`docs` [#79](https://git.moleculesai.app/molecule-ai/docs/pull/79))
|
||||
|
||||
### 🧹 Internal
|
||||
|
||||
|
||||
@@ -158,7 +158,7 @@ The `id` field is your workspace ID — remember it.
|
||||
|---|---|
|
||||
| "Failed to send message — agent may be unreachable" | The tenant couldn't POST to your URL. Verify `curl https://<your-tunnel>/health` returns 200 from another machine. |
|
||||
| Response takes > 30s | Canvas times out around 30s. Keep initial implementations simple. For long-running work, return a placeholder and use [polling mode](#next-step-polling-mode-preview) (once available). |
|
||||
| Agent duplicated in chat | Known canvas bug where WebSocket + HTTP responses both render. Fixed in [molecule-core #1517](https://github.com/Molecule-AI/molecule-core/pull/1517). |
|
||||
| Agent duplicated in chat | Known canvas bug where WebSocket + HTTP responses both render. Fixed in [molecule-core #1517](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1517). |
|
||||
| Agent replies but canvas shows "Agent unreachable" | Check the tenant can reach your URL. Cloudflare quick tunnels rotate — the URL in your canvas may point at a dead tunnel after restart. |
|
||||
| Getting 404 when POSTing to tenant | Add `X-Molecule-Org-Id` header. The tenant's security layer 404s unmatched origin requests by design. |
|
||||
|
||||
@@ -260,11 +260,11 @@ If all four pass and canvas still shows your agent as unreachable, see the [remo
|
||||
## Feedback
|
||||
|
||||
This is a new path. Tell us what broke:
|
||||
- Open an issue: https://github.com/Molecule-AI/molecule-core/issues/new?labels=external-workspace
|
||||
- Open an issue: https://git.moleculesai.app/molecule-ai/molecule-core/issues/new?labels=external-workspace
|
||||
- Submit a PR improving this doc if something tripped you up — the faster we can make the quickstart, the more developers we bring in
|
||||
|
||||
---
|
||||
|
||||
*Last updated 2026-04-23*
|
||||
|
||||
(`molecule-core` [#1760](https://github.com/Molecule-AI/molecule-core/pull/1760))
|
||||
(`molecule-core` [#1760](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1760))
|
||||
@@ -78,7 +78,7 @@ Every log entry automatically includes MCP request context (tool name, request I
|
||||
|
||||
Set `LOG_LEVEL=debug` (level 20) to trace all tool calls and request IDs. Set `LOG_LEVEL=error` (level 50) in CI to suppress informational output.
|
||||
|
||||
See [`molecule-mcp-server` PR #6](https://github.com/Molecule-AI/molecule-mcp-server/pull/6) for implementation details.
|
||||
See [`molecule-mcp-server` PR #6](https://git.moleculesai.app/molecule-ai/molecule-mcp-server/pull/6) for implementation details.
|
||||
|
||||
## Tool Reference
|
||||
|
||||
|
||||
@@ -90,4 +90,4 @@ molecule completion [bash|zsh|fish|powershell]
|
||||
- `fish` — Fish shell completions (~/.config/fish/completions)
|
||||
- `powershell` — PowerShell completions ($PROFILE)
|
||||
|
||||
See [`molecule-cli` PR #5](https://github.com/Molecule-AI/molecule-cli/pull/5) for implementation details.
|
||||
See [`molecule-cli` PR #5](https://git.moleculesai.app/molecule-ai/molecule-cli/pull/5) for implementation details.
|
||||
|
||||
@@ -339,7 +339,7 @@ If you are routing a Gemini model through a key that triggers the compat shim (e
|
||||
- [Concepts — Workspaces](/docs/concepts#workspaces)
|
||||
- [API Reference — POST /workspaces](/docs/api-reference#post-workspaces)
|
||||
- [Google ADK Runtime](/docs/google-adk) — Gemini-native alternative to Hermes for ADK-first workflows
|
||||
- PR #240: [Phase 2a — native Anthropic dispatch](https://github.com/Molecule-AI/molecule-core/pull/240)
|
||||
- PR #255: [Phase 2b — native Gemini dispatch](https://github.com/Molecule-AI/molecule-core/pull/255)
|
||||
- PR #267: [Phase 2c — multi-turn history on all paths](https://github.com/Molecule-AI/molecule-core/pull/267)
|
||||
- Issue [#513](https://github.com/Molecule-AI/molecule-core/issues/513)
|
||||
- PR #240: [Phase 2a — native Anthropic dispatch](https://git.moleculesai.app/molecule-ai/molecule-core/pull/240)
|
||||
- PR #255: [Phase 2b — native Gemini dispatch](https://git.moleculesai.app/molecule-ai/molecule-core/pull/255)
|
||||
- PR #267: [Phase 2c — multi-turn history on all paths](https://git.moleculesai.app/molecule-ai/molecule-core/pull/267)
|
||||
- Issue [#513](https://git.moleculesai.app/molecule-ai/molecule-core/issues/513)
|
||||
|
||||
@@ -165,14 +165,14 @@ ticket if a future revival of this BFG procedure is needed.
|
||||
|
||||
**Step 2 — Clean origin/main:**
|
||||
```bash
|
||||
git clone --mirror https://github.com/Molecule-AI/molecule-core /tmp/molecule-main-mirror
|
||||
git clone --mirror https://git.moleculesai.app/molecule-ai/molecule-core /tmp/molecule-main-mirror
|
||||
java -jar bfgr.jar --replace-text creds.txt --rewrite-not-committed-by-oss --no-blob-protection /tmp/molecule-main-mirror
|
||||
cd /tmp/molecule-main-mirror && git push --mirror
|
||||
```
|
||||
|
||||
**Step 3 — Clean origin/staging:**
|
||||
```bash
|
||||
git clone --mirror https://github.com/Molecule-AI/molecule-core /tmp/molecule-staging-mirror
|
||||
git clone --mirror https://git.moleculesai.app/molecule-ai/molecule-core /tmp/molecule-staging-mirror
|
||||
java -jar bfgr.jar --replace-text creds.txt --rewrite-not-committed-by-oss --no-blob-protection /tmp/molecule-staging-mirror
|
||||
cd /tmp/molecule-staging-mirror && git push --mirror
|
||||
```
|
||||
@@ -584,7 +584,7 @@ Core-BE — delegated to Dev Lead (A2A failed). Core-BE sub-team: please pick up
|
||||
|
||||
### Fix PR
|
||||
|
||||
[PR #1336](https://github.com/Molecule-AI/molecule-core/pull/1336) filed — `fix(orchestrator): fail-fast if WORKSPACE_ID env var is unset/empty`. Targets staging. Labels: bug, needs-work, area:backend-engineer, area:dev-lead.
|
||||
[PR #1336](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1336) filed — `fix(orchestrator): fail-fast if WORKSPACE_ID env var is unset/empty`. Targets staging. Labels: bug, needs-work, area:backend-engineer, area:dev-lead.
|
||||
|
||||
---
|
||||
|
||||
|
||||
@@ -163,11 +163,11 @@ not expose.
|
||||
| `molecule-skill-update-docs` | `[claude_code]` | `[claude_code, hermes]` |
|
||||
|
||||
Companion PRs:
|
||||
- [molecule-ai-plugin-ecc#2](https://github.com/Molecule-AI/molecule-ai-plugin-ecc/pull/2)
|
||||
- [molecule-ai-plugin-superpowers#2](https://github.com/Molecule-AI/molecule-ai-plugin-superpowers/pull/2)
|
||||
- [molecule-ai-plugin-molecule-dev#2](https://github.com/Molecule-AI/molecule-ai-plugin-molecule-dev/pull/2)
|
||||
- [molecule-ai-plugin-molecule-skill-cron-learnings#2](https://github.com/Molecule-AI/molecule-ai-plugin-molecule-skill-cron-learnings/pull/2)
|
||||
- [molecule-ai-plugin-molecule-skill-update-docs#2](https://github.com/Molecule-AI/molecule-ai-plugin-molecule-skill-update-docs/pull/2)
|
||||
- [molecule-ai-plugin-ecc#2](https://git.moleculesai.app/molecule-ai/molecule-ai-plugin-ecc/pull/2)
|
||||
- [molecule-ai-plugin-superpowers#2](https://git.moleculesai.app/molecule-ai/molecule-ai-plugin-superpowers/pull/2)
|
||||
- [molecule-ai-plugin-molecule-dev#2](https://git.moleculesai.app/molecule-ai/molecule-ai-plugin-molecule-dev/pull/2)
|
||||
- [molecule-ai-plugin-molecule-skill-cron-learnings#2](https://git.moleculesai.app/molecule-ai/molecule-ai-plugin-molecule-skill-cron-learnings/pull/2)
|
||||
- [molecule-ai-plugin-molecule-skill-update-docs#2](https://git.moleculesai.app/molecule-ai/molecule-ai-plugin-molecule-skill-update-docs/pull/2)
|
||||
|
||||
Security note: Security Auditor was offline at time of change. Self-assessed
|
||||
as non-security-impacting — adding `hermes` to a string list in `plugin.yaml`
|
||||
|
||||
@@ -274,7 +274,7 @@ MCP config and restart your runtime.
|
||||
|
||||
### `Workspace <id> was deleted on the platform...` from `get_workspace_info`
|
||||
|
||||
Since [#2429](https://github.com/Molecule-AI/molecule-core/pull/2449),
|
||||
Since [#2429](https://git.moleculesai.app/molecule-ai/molecule-core/pull/2449),
|
||||
`GET /workspaces/:id` returns **410 Gone** (not 200 + `status:"removed"`)
|
||||
when the workspace has been deleted. The MCP wheel's `get_workspace_info`
|
||||
tool surfaces this as a tailored error message:
|
||||
|
||||
@@ -12,7 +12,7 @@ This page documents security fixes shipped in the Molecule AI platform. Each ent
|
||||
## 2026-04-20 — CWE-22: Path Traversal in `copyFilesToContainer`
|
||||
|
||||
**Severity:** High (CWE-22)
|
||||
**PRs:** [#1271](https://github.com/Molecule-AI/molecule-core/pull/1271), [#1270](https://github.com/Molecule-AI/molecule-core/pull/1270), [#1267](https://github.com/Molecule-AI/molecule-core/pull/1267)
|
||||
**PRs:** [#1271](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1271), [#1270](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1270), [#1267](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1267)
|
||||
**Affected:** `workspace-server/internal/handlers/container_files.go` — `TemplatesHandler.copyFilesToContainer`
|
||||
|
||||
### Vulnerability
|
||||
@@ -37,7 +37,7 @@ File writes to workspace containers now validate all paths before writing to the
|
||||
## 2026-04-20 — CWE-78: Shell Injection in `deleteViaEphemeral`
|
||||
|
||||
**Severity:** High (CWE-78)
|
||||
**PR:** [#1310](https://github.com/Molecule-AI/molecule-core/pull/1310)
|
||||
**PR:** [#1310](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1310)
|
||||
**Affected:** `workspace-server/internal/handlers/container_files.go` — `TemplatesHandler.deleteViaEphemeral`
|
||||
|
||||
### Vulnerability
|
||||
@@ -69,9 +69,9 @@ Workspace file deletion operations now use safe argument-passing and validate al
|
||||
## 2026-04-21 — CWE-918: SSRF in MCP / A2A Proxy Endpoints (Updated: Regression Fix)
|
||||
|
||||
**Severity:** High (CWE-918)
|
||||
**Original PRs:** [#1274](https://github.com/Molecule-AI/molecule-core/pull/1274), [#1302](https://github.com/Molecule-AI/molecule-core/pull/1302)
|
||||
**Regression Fix PR:** [#1430](https://github.com/Molecule-AI/molecule-core/pull/1430)
|
||||
**Regression introduced by:** [#1363](https://github.com/Molecule-AI/molecule-core/pull/1363)
|
||||
**Original PRs:** [#1274](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1274), [#1302](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1302)
|
||||
**Regression Fix PR:** [#1430](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1430)
|
||||
**Regression introduced by:** [#1363](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1363)
|
||||
**Affected:** `workspace-server/internal/handlers/mcp.go` — `isSafeURL`, `isPrivateOrMetadataIP`; `workspace-server/internal/handlers/a2a_proxy.go`; `workspace-server/internal/handlers/a2a_proxy_helpers.go`
|
||||
|
||||
### Vulnerability
|
||||
@@ -105,9 +105,9 @@ In **SaaS mode** (`saasMode()` returns true), cross-EC2 traffic to RFC-1918 addr
|
||||
|
||||
### Regression (2026-04-21)
|
||||
|
||||
PR [#1363](https://github.com/Molecule-AI/molecule-core/pull/1363) (handler refactor) moved `isPrivateOrMetadataIP` into `a2a_proxy_helpers.go` but kept a **pre-SaaS version** that unconditionally blocked RFC-1918 addresses, breaking cross-EC2 communication in SaaS. The old version also **returned `false` for all IPv6 inputs**, fully bypassing SSRF protection for IPv6 targets.
|
||||
PR [#1363](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1363) (handler refactor) moved `isPrivateOrMetadataIP` into `a2a_proxy_helpers.go` but kept a **pre-SaaS version** that unconditionally blocked RFC-1918 addresses, breaking cross-EC2 communication in SaaS. The old version also **returned `false` for all IPv6 inputs**, fully bypassing SSRF protection for IPv6 targets.
|
||||
|
||||
PR [#1430](https://github.com/Molecule-AI/molecule-core/pull/1430) restores the correct SaaS-gated logic and adds proper IPv6 coverage to the A2A proxy path.
|
||||
PR [#1430](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1430) restores the correct SaaS-gated logic and adds proper IPv6 coverage to the A2A proxy path.
|
||||
|
||||
### User-facing summary
|
||||
|
||||
@@ -118,7 +118,7 @@ Platform outbound requests from workspaces (MCP tool calls, A2A proxy routing) v
|
||||
## 2026-04-21 — Audit Ledger HMAC Chain Guard
|
||||
|
||||
**Severity:** Low (denial-of-service / data integrity)
|
||||
**PRs:** [#1339](https://github.com/Molecule-AI/molecule-core/pull/1339), [#1352](https://github.com/Molecule-AI/molecule-core/pull/1352), [#1354](https://github.com/Molecule-AI/molecule-core/pull/1354) (backport to `main`)
|
||||
**PRs:** [#1339](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1339), [#1352](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1352), [#1354](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1354) (backport to `main`)
|
||||
**Affected:** `workspace-server/internal/handlers/audit.go`
|
||||
|
||||
### Vulnerability
|
||||
@@ -144,7 +144,7 @@ Audit chain verification now handles short or malformed HMAC values gracefully,
|
||||
## 2026-04-21 — Credential Scrub: `err.Error()` Leak Prevention
|
||||
|
||||
**Severity:** Medium (information disclosure)
|
||||
**PRs:** [#1282](https://github.com/Molecule-AI/molecule-core/pull/1282), [#1355](https://github.com/Molecule-AI/molecule-core/pull/1355), [#1359](https://github.com/Molecule-AI/molecule-core/pull/1359)
|
||||
**PRs:** [#1282](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1282), [#1355](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1355), [#1359](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1359)
|
||||
**Affected:** `workspace-server/internal/handlers/plugins_install_pipeline.go`, `workspace-server/internal/handlers/workspace_provision.go`, `content/docs/incidents/INCIDENT_LOG.md`
|
||||
|
||||
### Vulnerability
|
||||
|
||||
@@ -0,0 +1,284 @@
|
||||
---
|
||||
title: "Provisioning Workspaces on AWS EC2 (production SaaS provisioner)"
|
||||
description: "How the molecule-controlplane EC2 provisioner turns POST /cp/orgs and POST /workspaces calls into running tenant + workspace EC2 instances — env vars, lifecycle, tier sizing, and the migration off Fly Machines."
|
||||
---
|
||||
|
||||
# Provisioning Workspaces on AWS EC2 (production SaaS provisioner)
|
||||
|
||||
As of April 2026, Molecule AI's SaaS control plane provisions both **tenants**
|
||||
(per-org platform VMs) and **workspaces** (per-agent inference VMs) on
|
||||
AWS EC2 instances. The provisioner lives at
|
||||
[`molecule-controlplane/internal/provisioner/ec2.go`](https://git.moleculesai.app/molecule-ai/molecule-controlplane/blob/main/internal/provisioner/ec2.go)
|
||||
and is auto-wired by [`cmd/server/main.go`](https://git.moleculesai.app/molecule-ai/molecule-controlplane/blob/main/cmd/server/main.go)
|
||||
whenever AWS credentials are present in the control-plane environment. The
|
||||
platform manages workspace lifecycle, auth, and routing; AWS manages the
|
||||
underlying EC2, security groups, and network plumbing.
|
||||
|
||||
This tutorial documents what env vars the provisioner reads, what AWS
|
||||
actions it performs on a `POST /workspaces`, and how to operate it. It is
|
||||
the replacement for the deprecated [Fly Machines provisioner](./fly-machines-provisioner.md)
|
||||
tutorial.
|
||||
|
||||
> **Audience:** operators running a self-hosted Molecule AI control plane
|
||||
> against their own AWS account, and contributors debugging the
|
||||
> production CP. End-users of `*.moleculesai.app` do not need any of
|
||||
> this — provisioning happens transparently when you create an org or
|
||||
> workspace in the canvas.
|
||||
|
||||
## When EC2 is the active provisioner
|
||||
|
||||
`cmd/server/main.go` switches on whether `AWS_ACCESS_KEY_ID` is set in the
|
||||
process environment. If yes, it constructs an `*provisioner.EC2` from the
|
||||
config below and registers it as the tenant provisioner. There is **no**
|
||||
`CONTAINER_BACKEND=ec2` switch — the dispatcher key is presence of AWS
|
||||
credentials. (The legacy `flyio` backend still has dead code in the tree
|
||||
but is no longer wired in `main.go`.)
|
||||
|
||||
A typical Railway-hosted control plane log line on boot:
|
||||
|
||||
```
|
||||
provisioner: EC2 (region=us-east-2, ami=ami-0ea3c35c5c3284d82)
|
||||
tenant provisioner: EC2 ✓
|
||||
```
|
||||
|
||||
If `AWS_ACCESS_KEY_ID` is unset, you'll see `provisioner: disabled`
|
||||
instead — useful for local dev where you want orgs CRUD to work without
|
||||
AWS access.
|
||||
|
||||
## Environment variables
|
||||
|
||||
The full list of env vars `cmd/server/main.go` passes into
|
||||
`provisioner.EC2Config`. Anything not listed here is unused by the
|
||||
provisioner.
|
||||
|
||||
### Required for any EC2 provisioning
|
||||
|
||||
| Var | Default | Purpose |
|
||||
|-----|---------|---------|
|
||||
| `AWS_ACCESS_KEY_ID` | — | Toggle: presence enables EC2 wiring at all |
|
||||
| `AWS_SECRET_ACCESS_KEY` | — | Standard AWS SDK credential pair |
|
||||
| `AWS_REGION` | `us-east-1` | Region for tenant + workspace launches |
|
||||
| `EC2_AMI` | `ami-0ea3c35c5c3284d82` (Ubuntu 22.04 us-east-2) | Default AMI when no `thin_ami_pins` row matches |
|
||||
| `EC2_VPC_ID` | — | VPC for per-tenant SG creation; falls back to `EC2_SECURITY_GROUP` if unset |
|
||||
| `EC2_SUBNET_ID` | — | Subnet for `RunInstances` |
|
||||
| `SECRETS_ENCRYPTION_KEY` | — | KMS-envelope DEK for tenant secret-at-rest; provisioner stays disabled until set |
|
||||
|
||||
### Required for production (#44 secure bootstrap)
|
||||
|
||||
| Var | Purpose |
|
||||
|-----|---------|
|
||||
| `EC2_TENANT_IAM_PROFILE` | Instance profile attached to every tenant EC2 so it can fetch its bootstrap bundle from Secrets Manager at boot. Without this set, `Provision` returns the error `"Secrets Manager + IAM instance profile are required (#113 — plaintext user-data path removed)"`. |
|
||||
| `PROVISION_SHARED_SECRET` | Shared HMAC-secret stored alongside the tenant bootstrap bundle so workspace-server can authenticate inbound `/cp/...` callbacks |
|
||||
| `CP_ADMIN_API_TOKEN` | Token the tenant uses to call admin endpoints back on the control plane |
|
||||
| `CP_BASE_URL` | URL the tenant boot script uses to reach the control plane (typically `https://api.moleculesai.app`) |
|
||||
|
||||
### Required for the canvas Terminal tab
|
||||
|
||||
| Var | Purpose |
|
||||
|-----|---------|
|
||||
| `EIC_ENDPOINT_SG_ID` | Security-group ID of the region's [EC2 Instance Connect endpoint](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-connect-endpoint.html). The provisioner adds a `tcp/22` ingress rule to every per-tenant + per-workspace SG sourced from this SG, so the canvas Terminal can EIC-tunnel into the box for diagnostic ssh. Empty leaves the canvas Terminal broken with `failed to open EIC tunnel`. Discover with `aws ec2 describe-instance-connect-endpoints --region <region>`. |
|
||||
|
||||
### Cloudflare integration (per-tenant subdomains)
|
||||
|
||||
| Var | Purpose |
|
||||
|-----|---------|
|
||||
| `CLOUDFLARE_API_TOKEN` | Enables CF DNS client; provisioner creates the per-tenant `<slug>.<APP_DOMAIN>` CNAME |
|
||||
| `CLOUDFLARE_ACCOUNT_ID` | Enables CF Tunnel client (preferred over Worker + wildcard DNS) |
|
||||
| `CLOUDFLARE_ZONE_ID` | DNS zone the tenant CNAMEs are written under |
|
||||
| `APP_DOMAIN` | Default `moleculesai.app`; tenant FQDN becomes `<slug>.<APP_DOMAIN>` |
|
||||
|
||||
### Optional — runtime images, tier image, backups, canary, multi-env
|
||||
|
||||
| Var | Purpose |
|
||||
|-----|---------|
|
||||
| `MOLECULE_ENV` | `dev` / `staging` / `prod`; stamped on every EC2 tag and scopes the orphan-report's AWS lister so envs don't false-positive each other |
|
||||
| `EC2_INSTANCE_TYPE` | Default `t3.small` for tenant VMs (workspaces use the per-tier table below) |
|
||||
| `EC2_SECURITY_GROUP` | Fallback shared SG when `EC2_VPC_ID` is unset; production should leave this empty |
|
||||
| `EC2_KEY_NAME` | Optional EC2 KeyPair name for emergency console SSH |
|
||||
| `TENANT_IMAGE` | OCI ref for the tenant platform image (e.g. `ghcr.io/molecule-ai/platform-tenant:staging-<sha>`) |
|
||||
| `CANARY_TENANT_IMAGE` | Override `TENANT_IMAGE` for orgs flagged `is_canary=true` |
|
||||
| `CANARY_ROLE_ARN`, `CANARY_REGION`, `CANARY_VPC_ID`, `CANARY_SUBNET_ID` | Second-AWS-account target for canary tenant launches; all four required together |
|
||||
| `TENANT_BACKUP_S3_PREFIX` | Empty disables nightly `pg_dump`; set `s3://bucket/path` to enable |
|
||||
| `TENANT_BACKUP_REPORT_URL` | Defaults to `${CP_BASE_URL}/cp/tenants/backup-report` |
|
||||
| `GHCR_PULL_TOKEN` | GHCR pull token written into the tenant bootstrap bundle (private images only) |
|
||||
|
||||
For the always-current set, grep
|
||||
[`cmd/server/main.go` lines 86–158](https://git.moleculesai.app/molecule-ai/molecule-controlplane/blob/main/cmd/server/main.go#L86-L158)
|
||||
for `os.Getenv` calls inside the `provisioner.NewEC2` block.
|
||||
|
||||
## What happens on `POST /cp/orgs` (tenant provision)
|
||||
|
||||
`OrgsHandler.Create` calls into `(*EC2).Provision(ctx, cfg)`. Roughly:
|
||||
|
||||
1. **Cloudflare cleanup** — `cleanupStaleSlugArtifacts` scrubs any
|
||||
leftover tunnel/DNS rows from a previously-purged org with the same
|
||||
slug, so the slug is reusable.
|
||||
2. **Cloudflare Tunnel + DNS** — `CreateTunnel` → `CreateTunnelDNS`
|
||||
(writes `<slug>.<APP_DOMAIN>` → `<tunnel-id>.cfargotunnel.com`) →
|
||||
`ConfigureTunnelIngress` (registers the hostname on the tunnel's
|
||||
remote config so CF's edge knows to forward). DNS or ingress
|
||||
failures roll back the tunnel and abort the provision — fail-fast
|
||||
behavior added 2026-04-26 after a six-hour outage in which
|
||||
unreachable tenants timed out at 600–900s instead of surfacing the
|
||||
real CF API problem.
|
||||
3. **Bootstrap secrets to AWS Secrets Manager** — the provisioner
|
||||
generates a per-tenant DB password + admin token, packages them with
|
||||
the GHCR pull token, tunnel token, encryption key, and shared
|
||||
secret, and `PutSecret`s them at `awsapi.TenantSecretName(orgID)`.
|
||||
The tenant fetches this bundle at boot via its instance profile —
|
||||
no plaintext secrets in user-data (see #113).
|
||||
4. **Per-tenant SG creation** — `createPerTenantSG` calls
|
||||
`CreateSecurityGroup` with the resolved VPC, the per-org name, and
|
||||
the ingress rules from `tenantIngressRules(vpcCidr, EICEndpointSGID)`.
|
||||
The SG ingress always includes the canvas-terminal EIC `tcp/22`
|
||||
rule sourced from the EIC endpoint's own SG (UserIdGroupPairs, not
|
||||
`0.0.0.0/0` — only AWS EIC's endpoint can use it).
|
||||
5. **`RunInstances`** — `awsClient.RunInstance(ctx, awsapi.LaunchConfig{...})`
|
||||
launches with `InstanceType = TenantInstanceType` (default
|
||||
`t3.small`), the resolved AMI, IAM instance profile, base64-encoded
|
||||
user-data, and tags `OrgID` / `OrgSlug` / `Role=tenant` / `TunnelID`
|
||||
/ `SGID`. Volume size is 30 GB.
|
||||
6. **Audit row** — every CF, SG, Secrets Manager, and EC2 lifecycle
|
||||
event is recorded in the `tenant_resources` audit table (#2343)
|
||||
so the orphan reconciler can diff claims vs live state.
|
||||
|
||||
`Provision` returns a `*Result` whose fields (`FlyMachineID`, `FlyRegion`,
|
||||
`AdminToken`) are still named after Fly. The EC2 provisioner fake-fills
|
||||
them with EC2 equivalents (`InstanceID`, `AWSRegion`); a column-rename
|
||||
migration is on the controlplane backlog.
|
||||
|
||||
## What happens on `POST /workspaces` (workspace provision)
|
||||
|
||||
`workspace-server`'s `POST /workspaces` reaches the control plane via
|
||||
`/cp/workspaces/provision`, which calls
|
||||
`(*EC2).ProvisionWorkspace(ctx, workspaceID, runtime, orgID, tier, platformURL, env)`:
|
||||
|
||||
1. **Resolve tier resources** — `workspaceTierResources(tier)` returns
|
||||
`(instanceType, volumeSize)` per the table below. Hermes runtime
|
||||
floors `volumeSize` to 50 GB regardless of tier (uv + Python venv +
|
||||
Node.js gateway pegs disk at 18–25 GB during install).
|
||||
2. **Resolve AMI** — `resolveWorkspaceAMI` looks up `thin_ami_pins`
|
||||
for the runtime + region. A pin row means the AMI is pre-baked
|
||||
(per `packer/scripts/install-base.sh`) and user-data can skip
|
||||
apt-update + the Python/Node installs (60–140 s saved per
|
||||
provision, RFC #388). Fallback to the static `WorkspaceAMI`.
|
||||
3. **Resolve runtime image** — `resolveRuntimeImage` looks up
|
||||
`runtime_image_pins` and emits the containerized user-data path
|
||||
(docker pull + run) when present. Independent of the AMI gate
|
||||
above; the new path also installs Docker if missing on a thin/stock
|
||||
AMI.
|
||||
4. **Per-workspace SG creation** — same `createPerTenantSG` call with
|
||||
`namePrefix="workspace"`. Workspace SGs get
|
||||
`workspaceIngressRules(EICEndpointSGID)` — currently the EIC
|
||||
`tcp/22` rule and nothing else (workspaces sit behind the
|
||||
Cloudflare Tunnel for HTTP).
|
||||
5. **`RunInstance`** — launches with `wsShort = workspaceID[:12]`
|
||||
prefixed name, the resolved instance type + volume + AMI +
|
||||
user-data, and tags `WorkspaceID` / `Runtime` / `Role=workspace`
|
||||
/ `SGID` / `OrgID`. The `OrgID` tag is what lets
|
||||
`DeprovisionInstance` cascade-terminate workspace EC2s when their
|
||||
tenant is deleted (incident 2026-04-23: ~27 orphaned workspace
|
||||
EC2s pinned staging at the 64 vCPU limit before the tag was
|
||||
added).
|
||||
6. **Audit row** — `tenant_resources` `KindEC2Instance` `StateCreated`
|
||||
with role / runtime / tier / workspace metadata.
|
||||
|
||||
The boot script registers the workspace agent with the platform via
|
||||
`/workspaces/:id/register`, the platform issues an A2A auth token, and
|
||||
the agent comes up ready for `message/send` calls.
|
||||
|
||||
## Tier-based resource sizing
|
||||
|
||||
`workspaceTierResources` is the single source of truth. As of writing,
|
||||
all tiers below T4 are clamped up to T4 (the SaaS floor) and tiers
|
||||
above T4 are also clamped down to T4 (today's max):
|
||||
|
||||
| Tier | Instance type | Volume | Effective use |
|
||||
|------|---------------|--------|---------------|
|
||||
| T1 / T2 | clamped to T4 | clamped to T4 | not in production |
|
||||
| T3 | `t3.medium` | 40 GB | reserved (clamped today) |
|
||||
| T4 | `t3.large` | 80 GB | all production workspaces |
|
||||
|
||||
If you set a tier outside `[3, 4]` the clamp lifts it to T4 — a cheap
|
||||
mis-provision rather than a fall-through to the unset `t3.small`
|
||||
default. The clamp was added in PR #434 follow-up after `tier=5`
|
||||
silently yielded `t3.small`.
|
||||
|
||||
Hermes overrides volume to 50 GB minimum regardless of tier.
|
||||
|
||||
## Lifecycle — stop, restart, redeploy, teardown
|
||||
|
||||
| Operation | Mechanism |
|
||||
|-----------|-----------|
|
||||
| **Stop / start a tenant** | `POST /cp/admin/tenants/:slug/{stop,start}` → `(*EC2).Stop` / `Start` via the EC2 API (no termination) |
|
||||
| **Redeploy a tenant** (in-place new image) | `POST /cp/admin/tenants/:slug/redeploy` → SSM Run Command pulls the latest `TENANT_IMAGE` and recreates the platform container; never reboots EC2 |
|
||||
| **Refresh workspace template images** | `POST /cp/admin/tenants/:slug/workspaces/redeploy` (single-tenant) or `POST /cp/admin/tenants/workspaces/redeploy-fleet` (canary-batched fleet); HTTP-only, no SSM |
|
||||
| **Delete a workspace** | platform `DELETE /workspaces/:id` → CP `DeprovisionInstance(workspaceInstanceID, ...)` terminates the EC2 + cleans DNS + SG |
|
||||
| **Delete a tenant (Art. 17 cascade)** | `DELETE /cp/orgs/:slug` → cascade-terminates all workspace EC2s tagged with this `OrgID`, then terminates the tenant EC2, then deletes the SG, Secrets Manager bundle, CF tunnel + CNAME |
|
||||
| **Orphan recovery** | `tenant_resources` audit table + 30-min reconciler that diffs claims vs live AWS state and exposes orphan counts via `/cp/admin/stats` |
|
||||
|
||||
`DeprovisionInstance` polls termination under its own deadline so a
|
||||
stuck shutdown surfaces as a deprovision failure (and the caller's
|
||||
retry replays the cascade) instead of becoming a silent leak (#263).
|
||||
|
||||
## Why EC2 (vs Fly Machines)
|
||||
|
||||
The control plane has migrated infrastructure twice in April 2026 — both
|
||||
documented in the
|
||||
[molecule-controlplane README "Migration history"](https://git.moleculesai.app/molecule-ai/molecule-controlplane#migration-history):
|
||||
|
||||
- **Apr 2026 — CP host:** Fly (`molecule-cp.fly.dev`) → Railway
|
||||
(`api.moleculesai.app`).
|
||||
- **Apr 2026 — tenant + workspace compute:** Fly Machines → AWS EC2
|
||||
with SSM Run Command for redeploy.
|
||||
|
||||
The drivers were production needs Fly couldn't easily meet:
|
||||
|
||||
- **Region + data-residency control.** EU customers required
|
||||
EU-resident tenant data; AWS regional pinning per tenant is
|
||||
straightforward, Fly's region routing is per-app and harder to
|
||||
guarantee per-tenant.
|
||||
- **AWS-native auth chain for the canvas Terminal.** EC2 Instance
|
||||
Connect lets the platform open SSH tunnels to a tenant box via
|
||||
short-lived (60 s) IAM-signed public keys — no shared SSH keys,
|
||||
no inbound `0.0.0.0/0` rules. The same path powers the Files API
|
||||
EIC writes (see [SaaS file writes via EC2 Instance Connect](./saas-file-writes-eic.md)).
|
||||
- **Secrets Manager + IAM instance profiles** for tenant bootstrap
|
||||
secrets (#113 removed the plaintext user-data path).
|
||||
- **Cloudflare Tunnels** instead of public IPs — no inbound exposure
|
||||
on tenant EC2s; CF edge is the only ingress.
|
||||
- **`tenant_resources` audit table + reconciler** for cascade-cleanup
|
||||
guarantees that Fly's flat machine list couldn't enforce.
|
||||
|
||||
Old `internal/flyapi/` and `internal/provisioner/fly.go` files remain
|
||||
in the controlplane tree as legacy code awaiting cleanup; they are not
|
||||
wired in `cmd/server/main.go`.
|
||||
|
||||
## Operating notes
|
||||
|
||||
- **Schema names still say "fly".** The `org_instances` columns
|
||||
`fly_app` / `fly_machine_id` / `fly_region` are fake-filled with EC2
|
||||
equivalents; a rename migration is on the controlplane backlog
|
||||
(`PLAN.md`).
|
||||
- **`SECRETS_ENCRYPTION_KEY` gates the whole provisioner.** The crypto
|
||||
envelope is required even when only AWS creds are present; without
|
||||
it, `tenant provisioner: DISABLED` is logged and `POST /cp/orgs`
|
||||
accepts the row but never spins a tenant.
|
||||
- **Per-tenant SG creation needs `EC2_VPC_ID`.** If you only set
|
||||
`EC2_SECURITY_GROUP` (the legacy shared-SG fallback), every tenant
|
||||
shares one SG — caught the bug in PR #434 review. Production must
|
||||
set `EC2_VPC_ID`.
|
||||
- **`EIC_ENDPOINT_SG_ID` is silently load-bearing.** If unset, the
|
||||
canvas Terminal hangs with `failed to open EIC tunnel` and the
|
||||
Files API EIC write path returns 500 — the EC2 boots fine, the
|
||||
symptom only shows when an operator opens the canvas Terminal tab.
|
||||
|
||||
## References
|
||||
|
||||
- [`molecule-controlplane/internal/provisioner/ec2.go`](https://git.moleculesai.app/molecule-ai/molecule-controlplane/blob/main/internal/provisioner/ec2.go) — provisioner source
|
||||
- [`molecule-controlplane/cmd/server/main.go`](https://git.moleculesai.app/molecule-ai/molecule-controlplane/blob/main/cmd/server/main.go) — env-var wiring
|
||||
- [`molecule-controlplane` README "Migration history"](https://git.moleculesai.app/molecule-ai/molecule-controlplane#migration-history) — canonical record
|
||||
- [AWS EC2 Instance Connect endpoints](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-connect-endpoint.html)
|
||||
- [AWS Secrets Manager](https://docs.aws.amazon.com/secretsmanager/latest/userguide/intro.html)
|
||||
- [SaaS file writes via EC2 Instance Connect](./saas-file-writes-eic.md) — EIC is also the Files API write channel
|
||||
- [Fly Machines provisioner (DEPRECATED)](./fly-machines-provisioner.md) — previous backend, retained for migration history
|
||||
@@ -88,8 +88,8 @@ Fly Machines start in milliseconds and run in 35+ regions. Provisioning agent wo
|
||||
|
||||
## Related
|
||||
|
||||
- PR #501: [feat(platform): Fly Machines provisioner](https://github.com/Molecule-AI/molecule-core/pull/501)
|
||||
- PR #481: [feat(ci): deploy to Fly after image push](https://github.com/Molecule-AI/molecule-core/pull/481)
|
||||
- PR #501: [feat(platform): Fly Machines provisioner](https://git.moleculesai.app/molecule-ai/molecule-core/pull/501)
|
||||
- PR #481: [feat(ci): deploy to Fly after image push](https://git.moleculesai.app/molecule-ai/molecule-core/pull/481)
|
||||
- [Fly Machines API docs](https://fly.io/docs/machines/api/)
|
||||
- [Platform API reference](../api-reference.md)
|
||||
- Issue [#525](https://github.com/Molecule-AI/molecule-core/issues/525)
|
||||
- Issue [#525](https://git.moleculesai.app/molecule-ai/molecule-core/issues/525)
|
||||
|
||||
@@ -64,6 +64,6 @@ The real power surfaces when you mix runtimes on the same Molecule AI tenant. Yo
|
||||
|
||||
## Related
|
||||
|
||||
- PR #379: [feat(adapters): add gemini-cli runtime adapter](https://github.com/Molecule-AI/molecule-core/pull/379)
|
||||
- PR #379: [feat(adapters): add gemini-cli runtime adapter](https://git.moleculesai.app/molecule-ai/molecule-core/pull/379)
|
||||
- [Multi-provider Hermes docs](../architecture/hermes.md)
|
||||
- [Workspace runtimes reference](../reference/runtimes.md)
|
||||
|
||||
@@ -71,7 +71,7 @@ ADK workspaces participate in the same A2A network as Claude Code, Gemini CLI, H
|
||||
|
||||
## Related
|
||||
|
||||
- PR #550: [feat(adapters): add google-adk runtime adapter](https://github.com/Molecule-AI/molecule-core/pull/550)
|
||||
- PR #550: [feat(adapters): add google-adk runtime adapter](https://git.moleculesai.app/molecule-ai/molecule-core/pull/550)
|
||||
- [Google ADK (adk-python)](https://github.com/google/adk-python)
|
||||
- [Gemini CLI runtime tutorial](./gemini-cli-runtime.md)
|
||||
- [Platform API reference](../api-reference.md)
|
||||
|
||||
@@ -179,9 +179,9 @@ What is on the roadmap for Phase 2d (not yet shipped):
|
||||
|
||||
## Related
|
||||
|
||||
- PR #240: [Phase 2a — native Anthropic dispatch](https://github.com/Molecule-AI/molecule-core/pull/240)
|
||||
- PR #255: [Phase 2b — native Gemini dispatch](https://github.com/Molecule-AI/molecule-core/pull/255)
|
||||
- PR #267: [Phase 2c — multi-turn history on all paths](https://github.com/Molecule-AI/molecule-core/pull/267)
|
||||
- PR #240: [Phase 2a — native Anthropic dispatch](https://git.moleculesai.app/molecule-ai/molecule-core/pull/240)
|
||||
- PR #255: [Phase 2b — native Gemini dispatch](https://git.moleculesai.app/molecule-ai/molecule-core/pull/255)
|
||||
- PR #267: [Phase 2c — multi-turn history on all paths](https://git.moleculesai.app/molecule-ai/molecule-core/pull/267)
|
||||
- [Hermes adapter design](../adapters/hermes-adapter-design.md)
|
||||
- [Platform API reference](../api-reference.md)
|
||||
- Issue [#513](https://github.com/Molecule-AI/molecule-core/issues/513)
|
||||
- Issue [#513](https://git.moleculesai.app/molecule-ai/molecule-core/issues/513)
|
||||
|
||||
@@ -93,6 +93,6 @@ Molecule AI canvas without code changes.
|
||||
|
||||
## Related
|
||||
|
||||
- PR #480: [feat(channels): Lark / Feishu channel adapter](https://github.com/Molecule-AI/molecule-core/pull/480)
|
||||
- PR #480: [feat(channels): Lark / Feishu channel adapter](https://git.moleculesai.app/molecule-ai/molecule-core/pull/480)
|
||||
- [Social channels architecture](../agent-runtime/social-channels.md)
|
||||
- [Channel adapter reference](../api-reference.md#channels)
|
||||
@@ -246,4 +246,4 @@ For the API reference, see [`docs/api-reference`](/docs/api-reference) — the `
|
||||
|
||||
*SaaS federation is available for all Molecule AI platform operators. Contact the Molecule AI team to enable federation on your control plane.*
|
||||
|
||||
(`molecule-core` [#1700](https://github.com/Molecule-AI/molecule-core/pull/1700))
|
||||
(`molecule-core` [#1700](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1700))
|
||||
@@ -145,7 +145,7 @@ Key push + tunnel + write took longer than 30 s. Common causes: slow AWS EIC in
|
||||
|
||||
## Source PR
|
||||
|
||||
PR [#1702](https://github.com/Molecule-AI/molecule-core/pull/1702) — `feat(files-api): SSH-backed write for SaaS workspaces (fixes 500 docker not available)`
|
||||
PR [#1702](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1702) — `feat(files-api): SSH-backed write for SaaS workspaces (fixes 500 docker not available)`
|
||||
|
||||
Key files in `molecule-core`:
|
||||
- `workspace-server/internal/handlers/template_files_eic.go` — EIC write logic
|
||||
|
||||
Reference in New Issue
Block a user