Compare commits
6 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 1baa6fb44e | |||
| 5c6f068bcf | |||
| 5c974e037f | |||
| de089e005b | |||
| 6d0ac94e64 | |||
| 51d98ba794 |
@@ -75,6 +75,112 @@ Entries are published daily at 23:50 UTC.
|
||||
---
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
## 2026-05-10
|
||||
|
||||
### ✨ New features
|
||||
|
||||
- **A2A priority queue — Phase 1**: task dispatch now supports a `priority` field (`low` / `normal` / `high` / `urgent`). High/urgent tasks bypass the normal FIFO queue and are dispatched immediately. (`molecule-core` [#225](https://git.moleculesai.app/molecule-ai/molecule-core/pull/225))
|
||||
- **Plugin drift detector + queue + admin apply endpoint**: a new plugin drift detection system monitors loaded plugins against their pinned SHAs and surfaces drift via a queue; admins can review and apply corrections via a new `/admin/plugin-apply` endpoint. (`molecule-core` [#204](https://git.moleculesai.app/molecule-ai/molecule-core/pull/204))
|
||||
- **workspace-server pre-restart A2A drain signal**: the workspace-server now sends a pre-restart A2A drain signal before restarting, allowing peer workspaces to gracefully drain pending tasks instead of timing out. (`molecule-core` [#207](https://git.moleculesai.app/molecule-ai/molecule-core/pull/207))
|
||||
- **Admin auth runbook**: new `admin-auth.md` runbook documents the test-token route lockdown and `AdminAuth` middleware behaviour for operators. (`molecule-core` [#220](https://git.moleculesai.app/molecule-ai/molecule-core/pull/220))
|
||||
- **Static `.github-token` fallback to git credential helper**: workspace-server now falls back to a static `.github-token` value when no git credential helper is configured, enabling simpler air-gapped setups. (`molecule-core` [#219](https://git.moleculesai.app/molecule-ai/molecule-core/pull/219))
|
||||
- **Keyboard shortcuts in Toolbar help dialog**: all keyboard shortcuts are now documented in a Toolbar help dialog accessible from the canvas top bar. (`molecule-core` [#244](https://git.moleculesai.app/molecule-ai/molecule-core/pull/244))
|
||||
- **HTTP/SSE transport for Hermes MCP**: `a2a_mcp_server.py` now exposes `--transport=http --port=<N>` for Hermes workspaces that prefer HTTP + SSE over stdio. Endpoints: `POST /mcp` (JSON-RPC), `GET /mcp/stream` (SSE), `GET /health`. (`molecule-ai-workspace-runtime` [#5](https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-runtime/pull/5))
|
||||
|
||||
### 🔧 Fixes
|
||||
|
||||
- **SSRF validation before writing external workspace URL**: the workspace handler now validates URLs against SSRF allowlists before writing external workspace configurations. (`molecule-core` [#221](https://git.moleculesai.app/molecule-ai/molecule-core/pull/221))
|
||||
- **Dockerfile tenant chown /org-templates**: `/org-templates` directory now correctly chowned to the canvas user to fix `EACCES` on `mkdir` for external resolvers. (`molecule-core` [#223](https://git.moleculesai.app/molecule-ai/molecule-core/pull/223))
|
||||
- **CI `ghcr` → `ECR` migration + POST route smoke tests**: canary-verify workflow migrated from GHCR to ECR; new POST route smoke tests added for deployment verification. (`molecule-core` [#217](https://git.moleculesai.app/molecule-ai/molecule-core/pull/217))
|
||||
- **CI `dorny/paths-filter` → shell-based git diff**: replaced `dorny/paths-filter` with shell-based git diff for Gitea Actions compatibility. (`molecule-core` [#208](https://git.moleculesai.app/molecule-ai/molecule-core/pull/208))
|
||||
- **SOP tier-check clause splitter strips newlines**: the SOP tier-check script's clause splitter now correctly preserves newlines, fixing every `tier:low` PR CI failure. (`molecule-core` [#243](https://git.moleculesai.app/molecule-ai/molecule-core/pull/243))
|
||||
- **SOP tier-check APPROVER_TEAMS pattern matching**: outer quotes removed from case patterns in `APPROVER_TEAMS` matching logic, fixing approval team resolution. (`molecule-core` [#231](https://git.moleculesai.app/molecule-ai/molecule-core/pull/231))
|
||||
- **CI port `publish-workspace-server-image.yml` to `.gitea/workflows/`**: `publish-workspace-server-image.yml` migrated from `.github/workflows/` to `.gitea/workflows/` for Gitea Actions parity. (`molecule-core` [#237](https://git.moleculesai.app/molecule-ai/molecule-core/pull/237))
|
||||
- **CI port `publish-runtime.yml` to `.gitea/workflows/`**: `publish-runtime.yml` migrated from `.github/workflows/` to `.gitea/workflows/` for Gitea Actions parity. (`molecule-core` [#211](https://git.moleculesai.app/molecule-ai/molecule-core/pull/211))
|
||||
- **Docker base image digests pinned**: base image digests pinned in all Dockerfiles to ensure reproducible builds and prevent unexpected base image updates. (`molecule-core` [#199](https://git.moleculesai.app/molecule-ai/molecule-core/pull/199))
|
||||
- **KeyboardShortcutsDialog corrected**: keyboard shortcuts dialog text corrected and min-clamp test expectations fixed. (`molecule-core` [#200](https://git.moleculesai.app/molecule-ai/molecule-core/pull/200))
|
||||
- **`MODEL_PROVIDER` env var deprecated**: the `MODEL_PROVIDER` env var was misnamed — it carried the model ID (e.g. `claude-opus-4-7`) despite its name, and was being misused as a runtime selector. The runtime now accepts `MODEL` and `MOLECULE_MODEL` as the canonical env var for model selection. `MODEL_PROVIDER` still works but emits a deprecation warning. (`molecule-core` [#280](https://git.moleculesai.app/molecule-ai/molecule-core/pull/280))
|
||||
- **`delegate_task` self-delegation guard**: calling `delegate_task` with your own workspace ID now returns an early actionable error instead of deadlocking the task lock. Previously self-delegation would hold `_run_lock`, timeout after 30 s, and waste the turn. (`molecule-core` [#291](https://git.moleculesai.app/molecule-ai/molecule-core/pull/291))
|
||||
|
||||
### 📚 Docs
|
||||
|
||||
- **Canvas known issues section cleaned up**: duplicate entries removed from known issues; pre-commit action link fixed. (`molecule-core` [#202](https://git.moleculesai.app/molecule-ai/molecule-core/pull/202))
|
||||
- **Canvas controls section corrected**: Canvas Controls section corrected to reflect current keyboard navigation and MiniMap state. (`molecule-core` [#201](https://git.moleculesai.app/molecule-ai/molecule-core/pull/201))
|
||||
|
||||
### 🧹 Internal
|
||||
|
||||
- **SOP tier-check AND-composition of required team approvals per tier**: tier-check now enforces AND-composition of required team approvals per tier (`tier:high`). (`molecule-core` [#225](https://git.moleculesai.app/molecule-ai/molecule-core/pull/225))
|
||||
- **Canvas structural tests for TIER_CONFIG and COMM_TYPE_LABELS**: structural tests added for canvas TIER_CONFIG and COMM_TYPE_LABELS constants. (`molecule-core` [#245](https://git.moleculesai.app/molecule-ai/molecule-core/pull/245))
|
||||
|
||||
|
||||
## 2026-05-09
|
||||
|
||||
### ✨ New features
|
||||
|
||||
- **Keyboard-accessible canvas node resize**: Cmd/Ctrl+Arrow keys now resize canvas nodes in the topology view, satisfying WCAG AA keyboard navigation requirements. (`molecule-core` [#192](https://git.moleculesai.app/molecule-ai/molecule-core/pull/192))
|
||||
- **Keyboard-accessible edge anchors**: Enter/Space on an edge now selects the anchor for keyboard-based topology editing. (`molecule-core` [#190](https://git.moleculesai.app/molecule-ai/molecule-core/pull/190))
|
||||
|
||||
### 🔧 Fixes
|
||||
|
||||
- **Handlers auto-restart workspace after file write/delete/replace**: file mutations via the Canvas editor now correctly trigger workspace restart, ensuring the agent picks up the new file state without manual intervention. (`molecule-core` [#188](https://git.moleculesai.app/molecule-ai/molecule-core/pull/188))
|
||||
- **CI `gh api` → Gitea API migration**: all GitHub Actions `gh api` calls replaced with Gitea-compatible alternatives — CI now runs cleanly in Gitea Actions without GitHub dependency. (`molecule-core` [#191](https://git.moleculesai.app/molecule-ai/molecule-core/pull/191))
|
||||
- **WCAG AA contrast fix + KeyboardShortcutsDialog improvements**: toolbar contrast ratios corrected for WCAG AA compliance; keyboard shortcuts dialog now scrolls properly on small viewports. (`molecule-core` [#198](https://git.moleculesai.app/molecule-ai/molecule-core/pull/198))
|
||||
|
||||
### 📚 Docs
|
||||
|
||||
- **Canvas accessibility audit — all gaps now closed**: the accessibility audit doc updated to reflect fully closed status. (`molecule-core` [#197](https://git.moleculesai.app/molecule-ai/molecule-core/pull/197))
|
||||
- **Canvas controls section corrected**: keyboard accessibility and MiniMap presence now correctly documented. (`molecule-core` [#201](https://git.moleculesai.app/molecule-ai/molecule-core/pull/201))
|
||||
- **Stale audit doc text fixed**: stale text from PR #182 corrected in canvas audit documentation. (`molecule-core` [#187](https://git.moleculesai.app/molecule-ai/molecule-core/pull/187))
|
||||
|
||||
### 🧹 Internal
|
||||
|
||||
- **gh-identity module path migration**: `github.com/Molecule-AI/gh-identity` imports migrated to `git.moleculesai.app/molecule-ai/gh-identity` across all workspace templates. (`molecule-core` [#189](https://git.moleculesai.app/molecule-ai/molecule-core/pull/189))
|
||||
- **Pending uploads test isolation fix**: sweeper test isolation corrected — eliminates cross-test pollution in CI. (`molecule-core` [#185](https://git.moleculesai.app/molecule-ai/molecule-core/pull/185))
|
||||
- **Poll error counter to 0 before assert**: RecordsMetricsOnSuccess now polls error counter to 0 before asserting, eliminating flaky E2E test failures. (`molecule-core` [#194](https://git.moleculesai.app/molecule-ai/molecule-core/pull/194))
|
||||
|
||||
---
|
||||
|
||||
## 2026-05-08
|
||||
|
||||
### 🔧 Fixes
|
||||
|
||||
- **molecule-app CI testTimeout bumped to 20s**: vitest `testTimeout` increased to 20 s to handle shared act_runner load on the molecule-app repo. (`molecule-app` [#4](https://git.moleculesai.app/molecule-ai/molecule-app/pull/4))
|
||||
- **molecule-app drops staging branch — trunk-based migration**: first repo of the trunk-based development migration; staging branch removed. (`molecule-app` [#3](https://git.moleculesai.app/molecule-ai/molecule-app/pull/3))
|
||||
- **docs CI switches to ubuntu-latest**: docs repo CI now uses `ubuntu-latest` now that the repo is public. (`docs` [#4](https://git.moleculesai.app/molecule-ai/docs/pull/4))
|
||||
|
||||
---
|
||||
|
||||
## 2026-05-07
|
||||
|
||||
### 📚 Docs
|
||||
|
||||
- **Install guide — GitHub.com refs → Gitea**: all active `github.com/Molecule-AI` references migrated to `git.moleculesai.app/molecule-ai` in the installation docs. (`docs` [#1](https://git.moleculesai.app/molecule-ai/docs/pull/1))
|
||||
- **Website github.com → Gitea link migration**: `molecules-market` website links updated to point at Gitea. (`landingpage` [#3](https://git.moleculesai.app/molecule-ai/landingpage/pull/3))
|
||||
- **molecule-monorepo → molecule-core rename (Phase 4)**: landingpage follow-up renaming of `molecule-monorepo` to `molecule-core` in all cross-repo references. (`landingpage` [#4](https://git.moleculesai.app/molecule-ai/landingpage/pull/4))
|
||||
- **CI lowercase 'molecule-ai/' in cross-repo workflow refs**: cross-repo workflow references now consistently lowercase for Gitea Actions compatibility. (`landingpage` [#2](https://git.moleculesai.app/molecule-ai/landingpage/pull/2))
|
||||
- **Market Purchase button on tier cards**: demo Mock #1 — Purchase button now appears on tier cards in the molecules-market. (`landingpage` [#5](https://git.moleculesai.app/molecule-ai/landingpage/pull/5))
|
||||
|
||||
### 🔧 Fixes
|
||||
|
||||
- **molecule-app runs-on ubuntu-latest**: Hetzner runner labels post-suspension; CI now uses `ubuntu-latest`. (`molecule-app` [#1](https://git.moleculesai.app/molecule-ai/molecule-app/pull/1))
|
||||
- **molecule-app GitHub → Gitea URL migration**: all `github.com/Molecule-AI` references migrated to `git.moleculesai.app/molecule-ai` in molecule-app. (`molecule-app` [#2](https://git.moleculesai.app/molecule-ai/molecule-app/pull/2))
|
||||
- **docs GitHub → Gitea URL migration**: `github.com/Molecule-AI` references migrated to Gitea across docs repo. (`docs` [#3](https://git.moleculesai.app/molecule-ai/docs/pull/3))
|
||||
|
||||
---
|
||||
|
||||
## 2026-05-06
|
||||
|
||||
### 🧹 Internal
|
||||
|
||||
- **molecule-core org-wide Gitea URL migration**: all `github.com/Molecule-AI` references migrated to `git.moleculesai.app/molecule-ai` across all repos in the org. (`molecule-core`)
|
||||
- **Hetzner act-runner suspension**: CI runners updated to use `ubuntu-latest` labels following Hetzner act-runner suspension. (`molecule-app` [#1](https://git.moleculesai.app/molecule-ai/molecule-app/pull/1))
|
||||
|
||||
---
|
||||
|
||||
## 2026-04-22
|
||||
|
||||
### ✨ New features
|
||||
|
||||
@@ -0,0 +1,214 @@
|
||||
---
|
||||
title: "a2a-sdk v0 → v1 migration"
|
||||
description: "Cheat sheet for migrating workspace runtime code (and forks) from a2a-sdk 0.3.x to 1.x — renamed/removed symbols, common error shapes, before/after diffs."
|
||||
---
|
||||
|
||||
import { Callout } from 'fumadocs-ui/components/callout';
|
||||
|
||||
The `a2a-sdk` Python package released v1.0 in late April 2026. The
|
||||
Molecule workspace runtime migrated under tracking ID **KI-009** and
|
||||
shipped in `molecule-ai-workspace-runtime` **v0.1.11** (commit
|
||||
`d5cf872`, PR #39). The platform now runs exclusively on v1.
|
||||
|
||||
If you're consuming the platform's published wheel, bumping
|
||||
`molecule-ai-workspace-runtime>=0.1.11` handles the migration for
|
||||
you. If you maintain a fork of the runtime, an external agent talking
|
||||
A2A directly, or your own adapter that imports from `a2a.*`, this page
|
||||
is your checklist.
|
||||
|
||||
## Why migrate
|
||||
|
||||
- **Upstream**: `a2a-sdk` 1.0 reorganised the import surface, flattened
|
||||
`Part`, removed deprecated capability flags, and replaced the
|
||||
`A2AStarletteApplication` wrapper with explicit Starlette route
|
||||
factories.
|
||||
- **Platform**: as of 2026-04-24 the platform sends/receives via v1
|
||||
shapes natively. The SDK ships a v0_3 compat layer (enabled in the
|
||||
runtime via `enable_v0_3_compat=True` on `create_jsonrpc_routes`) so
|
||||
in-flight 0.x callers don't break, but new code should target v1.
|
||||
- **Forks/external runtimes**: v0 code throws on `import a2a.utils`
|
||||
and `from a2a.server.apps import A2AStarletteApplication` once you
|
||||
install v1, so the migration is a hard cutover at install time, not
|
||||
a soft deprecation.
|
||||
|
||||
## Cheat sheet — renamed and removed symbols
|
||||
|
||||
The four breaking changes that hit the Molecule runtime during KI-009.
|
||||
All four are confirmed against
|
||||
`molecule-core/workspace/` source.
|
||||
|
||||
### 1. `new_agent_text_message` renamed to `new_text_message`
|
||||
|
||||
- **v0 location**: `a2a.utils.new_agent_text_message`
|
||||
- **v1 location**: `a2a.helpers.new_text_message`
|
||||
|
||||
Both the module path and the symbol name changed.
|
||||
|
||||
### 2. `Part` API flattened — `TextPart` removed
|
||||
|
||||
- **v0**: `Part(root=TextPart(text="..."))` — `Part` wrapped a `root`
|
||||
union of `TextPart` / `FilePart` / `DataPart`.
|
||||
- **v1**: `Part(text="...")` — `Part` accepts the text payload
|
||||
directly. `TextPart` no longer exists as a public symbol.
|
||||
|
||||
`FilePart` / `DataPart` are similarly flattened (`Part(file=...)`,
|
||||
`Part(data=...)`); the Molecule runtime only emits text parts so the
|
||||
file/data shapes weren't exercised in KI-009 and aren't covered by
|
||||
this guide.
|
||||
|
||||
### 3. `A2AStarletteApplication` removed — use route factories
|
||||
|
||||
- **v0**: `from a2a.server.apps import A2AStarletteApplication` then
|
||||
`A2AStarletteApplication(agent_card, request_handler).build()`.
|
||||
- **v1**: `from a2a.server.routes import create_agent_card_routes,
|
||||
create_jsonrpc_routes` then build a Starlette app from the returned
|
||||
route lists.
|
||||
|
||||
The factories also let you mount the JSON-RPC endpoint at any path
|
||||
(the runtime mounts at `/` because the platform POSTs to root, see
|
||||
`workspace/main.py:279`).
|
||||
|
||||
### 4. `state_transition_history` capability flag removed
|
||||
|
||||
- **v0**: `AgentCapabilities(streaming=..., push_notifications=...,
|
||||
state_transition_history=True)` was a per-agent opt-in.
|
||||
- **v1**: the field is gone from `AgentCapabilities`. Per the SDK's own
|
||||
`a2a/compat/v0_3/conversions.py`: *"No longer supported in v1.0"*.
|
||||
The capability is now universal — `Task.history` is always available
|
||||
and `tasks/get` accepts `historyLength` via `apply_history_length()`.
|
||||
|
||||
If you pass `state_transition_history=...` as a kwarg to
|
||||
`AgentCapabilities` under v1, Pydantic will reject it. Drop the kwarg.
|
||||
See [`workspace/main.py:215`](https://git.moleculesai.app/Molecule-AI/molecule-core/blob/main/workspace/main.py#L215)
|
||||
for the explanatory comment that prevents future accidental re-adds.
|
||||
|
||||
## Common error shapes
|
||||
|
||||
When v0 code runs against the v1 SDK, the failure modes look like this:
|
||||
|
||||
| Error | Cause |
|
||||
|---|---|
|
||||
| `ModuleNotFoundError: No module named 'a2a.utils'` | v0 import path; module renamed to `a2a.helpers`. |
|
||||
| `ImportError: cannot import name 'A2AStarletteApplication' from 'a2a.server.apps'` | The whole `a2a.server.apps` module is gone in v1. Switch to `a2a.server.routes` factories. |
|
||||
| `ImportError: cannot import name 'TextPart' from 'a2a.types'` | Flattened `Part` API; use `Part(text=...)`. |
|
||||
| `ValueError: Protocol message AgentCapabilities has no "state_transition_history" field` | Removed capability flag passed as kwarg; drop it. |
|
||||
| `ValueError: Protocol message Part has no "root" field` | v0 `Part(root=TextPart(...))` shape against v1 schema; flatten to `Part(text=...)`. |
|
||||
|
||||
The protobuf-style `ValueError` messages always follow the pattern
|
||||
`Protocol message <Type> has no "<field>" field` — that's the
|
||||
fingerprint of "v0 shape against v1 schema." Treat it as a v0→v1 hint
|
||||
even if the field name isn't on the cheat sheet above.
|
||||
|
||||
## Migration checklist
|
||||
|
||||
1. **Bump the dep** — `a2a-sdk[http-server]>=0.3.25` is the floor; remove
|
||||
any `<1.0` upper bound. The Molecule wheel uses
|
||||
`a2a-sdk[http-server]>=0.3.25` with no upper bound (see
|
||||
[`molecule-ai-workspace-runtime/pyproject.toml`](https://git.moleculesai.app/Molecule-AI/molecule-ai-workspace-runtime/blob/main/pyproject.toml)).
|
||||
2. **Fix imports** — sweep the four renamed/removed symbols above. A
|
||||
safe grep is `grep -rn "from a2a\\|import a2a"` across your tree.
|
||||
3. **Fix removed-field reads/writes** — search for
|
||||
`state_transition_history` usage and delete the kwarg/field access.
|
||||
4. **Flatten `Part` constructors** — search for `Part(root=` and
|
||||
convert to `Part(text=...)` / `Part(file=...)` / `Part(data=...)`.
|
||||
5. **Replace the app factory** — search for `A2AStarletteApplication`
|
||||
and rewrite the bootstrap using `create_agent_card_routes` +
|
||||
`create_jsonrpc_routes`. Pass `enable_v0_3_compat=True` to
|
||||
`create_jsonrpc_routes` if your peers may still be on v0.
|
||||
6. **Re-run tests** — fixture-level mocks of `a2a.helpers` /
|
||||
`a2a.utils` need to mock both names so tests still pass during the
|
||||
rename rollout (see
|
||||
[`workspace/tests/conftest.py:105-111`](https://git.moleculesai.app/Molecule-AI/molecule-core/blob/main/workspace/tests/conftest.py#L105-L111)
|
||||
for the dual-name pattern).
|
||||
|
||||
## Before / after diffs
|
||||
|
||||
### `new_agent_text_message` → `new_text_message`
|
||||
|
||||
```diff
|
||||
-from a2a.utils import new_agent_text_message
|
||||
+from a2a.helpers import new_text_message
|
||||
|
||||
async def execute(self, context, event_queue):
|
||||
- await event_queue.enqueue_event(new_agent_text_message("hello"))
|
||||
+ await event_queue.enqueue_event(new_text_message("hello"))
|
||||
```
|
||||
|
||||
### Flat `Part` API
|
||||
|
||||
```diff
|
||||
-from a2a.types import Part, TextPart
|
||||
+from a2a.types import Part
|
||||
|
||||
-msg_parts = [Part(root=TextPart(text=final_text))]
|
||||
+msg_parts = [Part(text=final_text)]
|
||||
```
|
||||
|
||||
### `AgentCapabilities` — drop `state_transition_history`
|
||||
|
||||
```diff
|
||||
capabilities=AgentCapabilities(
|
||||
streaming=config.a2a.streaming,
|
||||
push_notifications=config.a2a.push_notifications,
|
||||
- state_transition_history=True,
|
||||
),
|
||||
```
|
||||
|
||||
### `A2AStarletteApplication` → route factories
|
||||
|
||||
```diff
|
||||
-from a2a.server.apps import A2AStarletteApplication
|
||||
+from a2a.server.routes import create_agent_card_routes, create_jsonrpc_routes
|
||||
|
||||
-app = A2AStarletteApplication(
|
||||
- agent_card=agent_card,
|
||||
- http_handler=request_handler,
|
||||
-).build()
|
||||
+routes = []
|
||||
+routes.extend(create_agent_card_routes(agent_card))
|
||||
+routes.extend(create_jsonrpc_routes(
|
||||
+ request_handler=request_handler,
|
||||
+ rpc_url="/",
|
||||
+ enable_v0_3_compat=True,
|
||||
+))
|
||||
+app = Starlette(routes=routes)
|
||||
```
|
||||
|
||||
The `enable_v0_3_compat=True` flag on `create_jsonrpc_routes` is what
|
||||
keeps in-flight v0 callers (peers that haven't migrated yet) from
|
||||
breaking — it accepts the old method names and translates them. The
|
||||
Molecule runtime ships with this flag on (see
|
||||
[`workspace/main.py:279`](https://git.moleculesai.app/Molecule-AI/molecule-core/blob/main/workspace/main.py#L279));
|
||||
strip it once your entire fleet is on v1.
|
||||
|
||||
## For downstream consumers
|
||||
|
||||
- **Using the published wheel** (`pip install
|
||||
molecule-ai-workspace-runtime>=0.1.11`): the migration is in the
|
||||
wheel — no code changes needed in your adapter or workspace template
|
||||
beyond bumping the pin.
|
||||
- **Running a fork of the runtime**: cherry-pick or rebase against
|
||||
commit `d5cf872` ("feat: migrate a2a-sdk 1.x (KI-009) (#39)") in
|
||||
`molecule-ai-workspace-runtime`. The diff is the canonical reference
|
||||
for what KI-009 actually changed.
|
||||
- **Standalone external agent** (talking A2A without the wheel): apply
|
||||
the [Migration checklist](#migration-checklist) directly to your
|
||||
source. The four cheat-sheet items are the entire surface that
|
||||
changed for the typical agent role; only `Part` flattening and the
|
||||
`state_transition_history` removal affect on-the-wire shapes — the
|
||||
other two are import-only.
|
||||
|
||||
<Callout type="info">
|
||||
The wheel keeps `enable_v0_3_compat=True` on `create_jsonrpc_routes`,
|
||||
so a v0 peer can still hit a v1 wheel and vice versa during the
|
||||
migration window. You don't need to coordinate a fleet-wide cutover —
|
||||
migrate at your own pace.
|
||||
</Callout>
|
||||
|
||||
## See also
|
||||
|
||||
- [`molecule-ai-workspace-runtime` v0.1.11 release](https://git.moleculesai.app/Molecule-AI/molecule-ai-workspace-runtime/releases/tag/v0.1.11) — first wheel containing KI-009
|
||||
- [PR #39 — feat: migrate a2a-sdk 1.x (KI-009)](https://git.moleculesai.app/Molecule-AI/molecule-ai-workspace-runtime/pulls/39)
|
||||
- [PR #48 — feat(a2a): dual-compat for a2a-sdk 0.3.x and 1.x](https://git.moleculesai.app/Molecule-AI/molecule-ai-workspace-runtime/pulls/48) — runtime-side compat shim that keeps v0 peers working against the v1 wheel
|
||||
- [Bring Your Own Runtime (MCP)](/docs/runtime-mcp) — universal wheel install path
|
||||
- [External Agents](/docs/external-agents) — manual A2A path for non-MCP runtimes
|
||||
@@ -102,6 +102,22 @@ example above. Drop it into your client's MCP settings file
|
||||
(typically `~/.cursor/mcp.json` for Cursor, the MCP Servers panel for
|
||||
Cline) and restart the client.
|
||||
|
||||
## Environment variables
|
||||
|
||||
The following env vars are supported by the `molecule-mcp` wheel in addition to the
|
||||
required trio (`WORKSPACE_ID`, `PLATFORM_URL`, `MOLECULE_WORKSPACE_TOKEN`):
|
||||
|
||||
| Env var | What it controls | Default |
|
||||
|---|---|---|
|
||||
| `MOLECULE_MODEL` | **Canonical.** The model ID the workspace runtime uses — e.g. `claude-opus-4-7`, `minimax/MiniMax-M2.7-highspeed` | _(unset — template default)_ |
|
||||
| `MODEL` | **Alias for `MOLECULE_MODEL`.** Accepted for backwards compatibility. | _(unset)_ |
|
||||
| `MODEL_PROVIDER` | **Deprecated.** This var was previously misread as "runtime selector" (`claude-code`, `minimax`, etc.) but carried the model ID, causing the wrong model to be used. Prefer `MOLECULE_MODEL`. | _(unset — emits deprecation warning)_ |
|
||||
| `MOLECULE_AGENT_SKILLS` | Comma-separated skill names — e.g. `research,code-review,memory-curation` | `[]` |
|
||||
|
||||
<Callout type="warn">
|
||||
`MODEL_PROVIDER` is deprecated. It was misnamed — despite its name it carried the **model ID** (e.g. `claude-opus-4-7`), not the runtime/provider name. Setting it caused production incidents where the Claude CLI received `--model MODEL_PROVIDER_VALUE` and returned 404s. Use `MOLECULE_MODEL` instead.
|
||||
</Callout>
|
||||
|
||||
## Optional — declare your identity & capabilities
|
||||
|
||||
Three additional env vars control how your workspace appears on the
|
||||
@@ -206,6 +222,38 @@ Claude Code, Cursor, Cline, OpenCode, hermes-agent, or anything else
|
||||
that opens an MCP stdio connection. If your client speaks MCP, it
|
||||
speaks the wheel.
|
||||
|
||||
## HTTP/SSE transport for Hermes workspaces
|
||||
|
||||
Hermes workspaces (which are MCP-native) can connect to the platform MCP
|
||||
server over **HTTP + Server-Sent Events** instead of stdio. This is the
|
||||
recommended path when Hermes runs as a standalone service rather than
|
||||
inside a shell.
|
||||
|
||||
The `a2a_mcp_server.py` in the runtime exposes two endpoints:
|
||||
|
||||
| Endpoint | Method | Purpose |
|
||||
|---|---|---|
|
||||
| `/mcp` | `POST` | Receive JSON-RPC requests |
|
||||
| `/mcp/stream` | `GET` | SSE stream for push-based responses |
|
||||
| `/health` | `GET` | Health check |
|
||||
|
||||
Start the server with the `--transport=http --port=<N>` flags:
|
||||
|
||||
```bash
|
||||
python a2a_mcp_server.py \
|
||||
--transport=http \
|
||||
--port=8080 \
|
||||
--workspace-id=<uuid> \
|
||||
--platform-url=https://<tenant>.moleculesai.app \
|
||||
--workspace-token=<token>
|
||||
```
|
||||
|
||||
<Callout type="info">
|
||||
The stdio transport (described in [Step 2](#step-2--add-it-to-your-runtime))
|
||||
remains the default. HTTP/SSE is an alternative for Hermes deployments
|
||||
where a long-running daemon process is preferred over a stdio subprocess.
|
||||
</Callout>
|
||||
|
||||
## Heartbeat & lifecycle
|
||||
|
||||
The wheel spawns a daemon thread that POSTs `/registry/heartbeat` every
|
||||
|
||||
@@ -0,0 +1,284 @@
|
||||
---
|
||||
title: "Provisioning Workspaces on AWS EC2 (production SaaS provisioner)"
|
||||
description: "How the molecule-controlplane EC2 provisioner turns POST /cp/orgs and POST /workspaces calls into running tenant + workspace EC2 instances — env vars, lifecycle, tier sizing, and the migration off Fly Machines."
|
||||
---
|
||||
|
||||
# Provisioning Workspaces on AWS EC2 (production SaaS provisioner)
|
||||
|
||||
As of April 2026, Molecule AI's SaaS control plane provisions both **tenants**
|
||||
(per-org platform VMs) and **workspaces** (per-agent inference VMs) on
|
||||
AWS EC2 instances. The provisioner lives at
|
||||
[`molecule-controlplane/internal/provisioner/ec2.go`](https://git.moleculesai.app/molecule-ai/molecule-controlplane/blob/main/internal/provisioner/ec2.go)
|
||||
and is auto-wired by [`cmd/server/main.go`](https://git.moleculesai.app/molecule-ai/molecule-controlplane/blob/main/cmd/server/main.go)
|
||||
whenever AWS credentials are present in the control-plane environment. The
|
||||
platform manages workspace lifecycle, auth, and routing; AWS manages the
|
||||
underlying EC2, security groups, and network plumbing.
|
||||
|
||||
This tutorial documents what env vars the provisioner reads, what AWS
|
||||
actions it performs on a `POST /workspaces`, and how to operate it. It is
|
||||
the replacement for the deprecated [Fly Machines provisioner](./fly-machines-provisioner.md)
|
||||
tutorial.
|
||||
|
||||
> **Audience:** operators running a self-hosted Molecule AI control plane
|
||||
> against their own AWS account, and contributors debugging the
|
||||
> production CP. End-users of `*.moleculesai.app` do not need any of
|
||||
> this — provisioning happens transparently when you create an org or
|
||||
> workspace in the canvas.
|
||||
|
||||
## When EC2 is the active provisioner
|
||||
|
||||
`cmd/server/main.go` switches on whether `AWS_ACCESS_KEY_ID` is set in the
|
||||
process environment. If yes, it constructs an `*provisioner.EC2` from the
|
||||
config below and registers it as the tenant provisioner. There is **no**
|
||||
`CONTAINER_BACKEND=ec2` switch — the dispatcher key is presence of AWS
|
||||
credentials. (The legacy `flyio` backend still has dead code in the tree
|
||||
but is no longer wired in `main.go`.)
|
||||
|
||||
A typical Railway-hosted control plane log line on boot:
|
||||
|
||||
```
|
||||
provisioner: EC2 (region=us-east-2, ami=ami-0ea3c35c5c3284d82)
|
||||
tenant provisioner: EC2 ✓
|
||||
```
|
||||
|
||||
If `AWS_ACCESS_KEY_ID` is unset, you'll see `provisioner: disabled`
|
||||
instead — useful for local dev where you want orgs CRUD to work without
|
||||
AWS access.
|
||||
|
||||
## Environment variables
|
||||
|
||||
The full list of env vars `cmd/server/main.go` passes into
|
||||
`provisioner.EC2Config`. Anything not listed here is unused by the
|
||||
provisioner.
|
||||
|
||||
### Required for any EC2 provisioning
|
||||
|
||||
| Var | Default | Purpose |
|
||||
|-----|---------|---------|
|
||||
| `AWS_ACCESS_KEY_ID` | — | Toggle: presence enables EC2 wiring at all |
|
||||
| `AWS_SECRET_ACCESS_KEY` | — | Standard AWS SDK credential pair |
|
||||
| `AWS_REGION` | `us-east-1` | Region for tenant + workspace launches |
|
||||
| `EC2_AMI` | `ami-0ea3c35c5c3284d82` (Ubuntu 22.04 us-east-2) | Default AMI when no `thin_ami_pins` row matches |
|
||||
| `EC2_VPC_ID` | — | VPC for per-tenant SG creation; falls back to `EC2_SECURITY_GROUP` if unset |
|
||||
| `EC2_SUBNET_ID` | — | Subnet for `RunInstances` |
|
||||
| `SECRETS_ENCRYPTION_KEY` | — | KMS-envelope DEK for tenant secret-at-rest; provisioner stays disabled until set |
|
||||
|
||||
### Required for production (#44 secure bootstrap)
|
||||
|
||||
| Var | Purpose |
|
||||
|-----|---------|
|
||||
| `EC2_TENANT_IAM_PROFILE` | Instance profile attached to every tenant EC2 so it can fetch its bootstrap bundle from Secrets Manager at boot. Without this set, `Provision` returns the error `"Secrets Manager + IAM instance profile are required (#113 — plaintext user-data path removed)"`. |
|
||||
| `PROVISION_SHARED_SECRET` | Shared HMAC-secret stored alongside the tenant bootstrap bundle so workspace-server can authenticate inbound `/cp/...` callbacks |
|
||||
| `CP_ADMIN_API_TOKEN` | Token the tenant uses to call admin endpoints back on the control plane |
|
||||
| `CP_BASE_URL` | URL the tenant boot script uses to reach the control plane (typically `https://api.moleculesai.app`) |
|
||||
|
||||
### Required for the canvas Terminal tab
|
||||
|
||||
| Var | Purpose |
|
||||
|-----|---------|
|
||||
| `EIC_ENDPOINT_SG_ID` | Security-group ID of the region's [EC2 Instance Connect endpoint](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-connect-endpoint.html). The provisioner adds a `tcp/22` ingress rule to every per-tenant + per-workspace SG sourced from this SG, so the canvas Terminal can EIC-tunnel into the box for diagnostic ssh. Empty leaves the canvas Terminal broken with `failed to open EIC tunnel`. Discover with `aws ec2 describe-instance-connect-endpoints --region <region>`. |
|
||||
|
||||
### Cloudflare integration (per-tenant subdomains)
|
||||
|
||||
| Var | Purpose |
|
||||
|-----|---------|
|
||||
| `CLOUDFLARE_API_TOKEN` | Enables CF DNS client; provisioner creates the per-tenant `<slug>.<APP_DOMAIN>` CNAME |
|
||||
| `CLOUDFLARE_ACCOUNT_ID` | Enables CF Tunnel client (preferred over Worker + wildcard DNS) |
|
||||
| `CLOUDFLARE_ZONE_ID` | DNS zone the tenant CNAMEs are written under |
|
||||
| `APP_DOMAIN` | Default `moleculesai.app`; tenant FQDN becomes `<slug>.<APP_DOMAIN>` |
|
||||
|
||||
### Optional — runtime images, tier image, backups, canary, multi-env
|
||||
|
||||
| Var | Purpose |
|
||||
|-----|---------|
|
||||
| `MOLECULE_ENV` | `dev` / `staging` / `prod`; stamped on every EC2 tag and scopes the orphan-report's AWS lister so envs don't false-positive each other |
|
||||
| `EC2_INSTANCE_TYPE` | Default `t3.small` for tenant VMs (workspaces use the per-tier table below) |
|
||||
| `EC2_SECURITY_GROUP` | Fallback shared SG when `EC2_VPC_ID` is unset; production should leave this empty |
|
||||
| `EC2_KEY_NAME` | Optional EC2 KeyPair name for emergency console SSH |
|
||||
| `TENANT_IMAGE` | OCI ref for the tenant platform image (e.g. `ghcr.io/molecule-ai/platform-tenant:staging-<sha>`) |
|
||||
| `CANARY_TENANT_IMAGE` | Override `TENANT_IMAGE` for orgs flagged `is_canary=true` |
|
||||
| `CANARY_ROLE_ARN`, `CANARY_REGION`, `CANARY_VPC_ID`, `CANARY_SUBNET_ID` | Second-AWS-account target for canary tenant launches; all four required together |
|
||||
| `TENANT_BACKUP_S3_PREFIX` | Empty disables nightly `pg_dump`; set `s3://bucket/path` to enable |
|
||||
| `TENANT_BACKUP_REPORT_URL` | Defaults to `${CP_BASE_URL}/cp/tenants/backup-report` |
|
||||
| `GHCR_PULL_TOKEN` | GHCR pull token written into the tenant bootstrap bundle (private images only) |
|
||||
|
||||
For the always-current set, grep
|
||||
[`cmd/server/main.go` lines 86–158](https://git.moleculesai.app/molecule-ai/molecule-controlplane/blob/main/cmd/server/main.go#L86-L158)
|
||||
for `os.Getenv` calls inside the `provisioner.NewEC2` block.
|
||||
|
||||
## What happens on `POST /cp/orgs` (tenant provision)
|
||||
|
||||
`OrgsHandler.Create` calls into `(*EC2).Provision(ctx, cfg)`. Roughly:
|
||||
|
||||
1. **Cloudflare cleanup** — `cleanupStaleSlugArtifacts` scrubs any
|
||||
leftover tunnel/DNS rows from a previously-purged org with the same
|
||||
slug, so the slug is reusable.
|
||||
2. **Cloudflare Tunnel + DNS** — `CreateTunnel` → `CreateTunnelDNS`
|
||||
(writes `<slug>.<APP_DOMAIN>` → `<tunnel-id>.cfargotunnel.com`) →
|
||||
`ConfigureTunnelIngress` (registers the hostname on the tunnel's
|
||||
remote config so CF's edge knows to forward). DNS or ingress
|
||||
failures roll back the tunnel and abort the provision — fail-fast
|
||||
behavior added 2026-04-26 after a six-hour outage in which
|
||||
unreachable tenants timed out at 600–900s instead of surfacing the
|
||||
real CF API problem.
|
||||
3. **Bootstrap secrets to AWS Secrets Manager** — the provisioner
|
||||
generates a per-tenant DB password + admin token, packages them with
|
||||
the GHCR pull token, tunnel token, encryption key, and shared
|
||||
secret, and `PutSecret`s them at `awsapi.TenantSecretName(orgID)`.
|
||||
The tenant fetches this bundle at boot via its instance profile —
|
||||
no plaintext secrets in user-data (see #113).
|
||||
4. **Per-tenant SG creation** — `createPerTenantSG` calls
|
||||
`CreateSecurityGroup` with the resolved VPC, the per-org name, and
|
||||
the ingress rules from `tenantIngressRules(vpcCidr, EICEndpointSGID)`.
|
||||
The SG ingress always includes the canvas-terminal EIC `tcp/22`
|
||||
rule sourced from the EIC endpoint's own SG (UserIdGroupPairs, not
|
||||
`0.0.0.0/0` — only AWS EIC's endpoint can use it).
|
||||
5. **`RunInstances`** — `awsClient.RunInstance(ctx, awsapi.LaunchConfig{...})`
|
||||
launches with `InstanceType = TenantInstanceType` (default
|
||||
`t3.small`), the resolved AMI, IAM instance profile, base64-encoded
|
||||
user-data, and tags `OrgID` / `OrgSlug` / `Role=tenant` / `TunnelID`
|
||||
/ `SGID`. Volume size is 30 GB.
|
||||
6. **Audit row** — every CF, SG, Secrets Manager, and EC2 lifecycle
|
||||
event is recorded in the `tenant_resources` audit table (#2343)
|
||||
so the orphan reconciler can diff claims vs live state.
|
||||
|
||||
`Provision` returns a `*Result` whose fields (`FlyMachineID`, `FlyRegion`,
|
||||
`AdminToken`) are still named after Fly. The EC2 provisioner fake-fills
|
||||
them with EC2 equivalents (`InstanceID`, `AWSRegion`); a column-rename
|
||||
migration is on the controlplane backlog.
|
||||
|
||||
## What happens on `POST /workspaces` (workspace provision)
|
||||
|
||||
`workspace-server`'s `POST /workspaces` reaches the control plane via
|
||||
`/cp/workspaces/provision`, which calls
|
||||
`(*EC2).ProvisionWorkspace(ctx, workspaceID, runtime, orgID, tier, platformURL, env)`:
|
||||
|
||||
1. **Resolve tier resources** — `workspaceTierResources(tier)` returns
|
||||
`(instanceType, volumeSize)` per the table below. Hermes runtime
|
||||
floors `volumeSize` to 50 GB regardless of tier (uv + Python venv +
|
||||
Node.js gateway pegs disk at 18–25 GB during install).
|
||||
2. **Resolve AMI** — `resolveWorkspaceAMI` looks up `thin_ami_pins`
|
||||
for the runtime + region. A pin row means the AMI is pre-baked
|
||||
(per `packer/scripts/install-base.sh`) and user-data can skip
|
||||
apt-update + the Python/Node installs (60–140 s saved per
|
||||
provision, RFC #388). Fallback to the static `WorkspaceAMI`.
|
||||
3. **Resolve runtime image** — `resolveRuntimeImage` looks up
|
||||
`runtime_image_pins` and emits the containerized user-data path
|
||||
(docker pull + run) when present. Independent of the AMI gate
|
||||
above; the new path also installs Docker if missing on a thin/stock
|
||||
AMI.
|
||||
4. **Per-workspace SG creation** — same `createPerTenantSG` call with
|
||||
`namePrefix="workspace"`. Workspace SGs get
|
||||
`workspaceIngressRules(EICEndpointSGID)` — currently the EIC
|
||||
`tcp/22` rule and nothing else (workspaces sit behind the
|
||||
Cloudflare Tunnel for HTTP).
|
||||
5. **`RunInstance`** — launches with `wsShort = workspaceID[:12]`
|
||||
prefixed name, the resolved instance type + volume + AMI +
|
||||
user-data, and tags `WorkspaceID` / `Runtime` / `Role=workspace`
|
||||
/ `SGID` / `OrgID`. The `OrgID` tag is what lets
|
||||
`DeprovisionInstance` cascade-terminate workspace EC2s when their
|
||||
tenant is deleted (incident 2026-04-23: ~27 orphaned workspace
|
||||
EC2s pinned staging at the 64 vCPU limit before the tag was
|
||||
added).
|
||||
6. **Audit row** — `tenant_resources` `KindEC2Instance` `StateCreated`
|
||||
with role / runtime / tier / workspace metadata.
|
||||
|
||||
The boot script registers the workspace agent with the platform via
|
||||
`/workspaces/:id/register`, the platform issues an A2A auth token, and
|
||||
the agent comes up ready for `message/send` calls.
|
||||
|
||||
## Tier-based resource sizing
|
||||
|
||||
`workspaceTierResources` is the single source of truth. As of writing,
|
||||
all tiers below T4 are clamped up to T4 (the SaaS floor) and tiers
|
||||
above T4 are also clamped down to T4 (today's max):
|
||||
|
||||
| Tier | Instance type | Volume | Effective use |
|
||||
|------|---------------|--------|---------------|
|
||||
| T1 / T2 | clamped to T4 | clamped to T4 | not in production |
|
||||
| T3 | `t3.medium` | 40 GB | reserved (clamped today) |
|
||||
| T4 | `t3.large` | 80 GB | all production workspaces |
|
||||
|
||||
If you set a tier outside `[3, 4]` the clamp lifts it to T4 — a cheap
|
||||
mis-provision rather than a fall-through to the unset `t3.small`
|
||||
default. The clamp was added in PR #434 follow-up after `tier=5`
|
||||
silently yielded `t3.small`.
|
||||
|
||||
Hermes overrides volume to 50 GB minimum regardless of tier.
|
||||
|
||||
## Lifecycle — stop, restart, redeploy, teardown
|
||||
|
||||
| Operation | Mechanism |
|
||||
|-----------|-----------|
|
||||
| **Stop / start a tenant** | `POST /cp/admin/tenants/:slug/{stop,start}` → `(*EC2).Stop` / `Start` via the EC2 API (no termination) |
|
||||
| **Redeploy a tenant** (in-place new image) | `POST /cp/admin/tenants/:slug/redeploy` → SSM Run Command pulls the latest `TENANT_IMAGE` and recreates the platform container; never reboots EC2 |
|
||||
| **Refresh workspace template images** | `POST /cp/admin/tenants/:slug/workspaces/redeploy` (single-tenant) or `POST /cp/admin/tenants/workspaces/redeploy-fleet` (canary-batched fleet); HTTP-only, no SSM |
|
||||
| **Delete a workspace** | platform `DELETE /workspaces/:id` → CP `DeprovisionInstance(workspaceInstanceID, ...)` terminates the EC2 + cleans DNS + SG |
|
||||
| **Delete a tenant (Art. 17 cascade)** | `DELETE /cp/orgs/:slug` → cascade-terminates all workspace EC2s tagged with this `OrgID`, then terminates the tenant EC2, then deletes the SG, Secrets Manager bundle, CF tunnel + CNAME |
|
||||
| **Orphan recovery** | `tenant_resources` audit table + 30-min reconciler that diffs claims vs live AWS state and exposes orphan counts via `/cp/admin/stats` |
|
||||
|
||||
`DeprovisionInstance` polls termination under its own deadline so a
|
||||
stuck shutdown surfaces as a deprovision failure (and the caller's
|
||||
retry replays the cascade) instead of becoming a silent leak (#263).
|
||||
|
||||
## Why EC2 (vs Fly Machines)
|
||||
|
||||
The control plane has migrated infrastructure twice in April 2026 — both
|
||||
documented in the
|
||||
[molecule-controlplane README "Migration history"](https://git.moleculesai.app/molecule-ai/molecule-controlplane#migration-history):
|
||||
|
||||
- **Apr 2026 — CP host:** Fly (`molecule-cp.fly.dev`) → Railway
|
||||
(`api.moleculesai.app`).
|
||||
- **Apr 2026 — tenant + workspace compute:** Fly Machines → AWS EC2
|
||||
with SSM Run Command for redeploy.
|
||||
|
||||
The drivers were production needs Fly couldn't easily meet:
|
||||
|
||||
- **Region + data-residency control.** EU customers required
|
||||
EU-resident tenant data; AWS regional pinning per tenant is
|
||||
straightforward, Fly's region routing is per-app and harder to
|
||||
guarantee per-tenant.
|
||||
- **AWS-native auth chain for the canvas Terminal.** EC2 Instance
|
||||
Connect lets the platform open SSH tunnels to a tenant box via
|
||||
short-lived (60 s) IAM-signed public keys — no shared SSH keys,
|
||||
no inbound `0.0.0.0/0` rules. The same path powers the Files API
|
||||
EIC writes (see [SaaS file writes via EC2 Instance Connect](./saas-file-writes-eic.md)).
|
||||
- **Secrets Manager + IAM instance profiles** for tenant bootstrap
|
||||
secrets (#113 removed the plaintext user-data path).
|
||||
- **Cloudflare Tunnels** instead of public IPs — no inbound exposure
|
||||
on tenant EC2s; CF edge is the only ingress.
|
||||
- **`tenant_resources` audit table + reconciler** for cascade-cleanup
|
||||
guarantees that Fly's flat machine list couldn't enforce.
|
||||
|
||||
Old `internal/flyapi/` and `internal/provisioner/fly.go` files remain
|
||||
in the controlplane tree as legacy code awaiting cleanup; they are not
|
||||
wired in `cmd/server/main.go`.
|
||||
|
||||
## Operating notes
|
||||
|
||||
- **Schema names still say "fly".** The `org_instances` columns
|
||||
`fly_app` / `fly_machine_id` / `fly_region` are fake-filled with EC2
|
||||
equivalents; a rename migration is on the controlplane backlog
|
||||
(`PLAN.md`).
|
||||
- **`SECRETS_ENCRYPTION_KEY` gates the whole provisioner.** The crypto
|
||||
envelope is required even when only AWS creds are present; without
|
||||
it, `tenant provisioner: DISABLED` is logged and `POST /cp/orgs`
|
||||
accepts the row but never spins a tenant.
|
||||
- **Per-tenant SG creation needs `EC2_VPC_ID`.** If you only set
|
||||
`EC2_SECURITY_GROUP` (the legacy shared-SG fallback), every tenant
|
||||
shares one SG — caught the bug in PR #434 review. Production must
|
||||
set `EC2_VPC_ID`.
|
||||
- **`EIC_ENDPOINT_SG_ID` is silently load-bearing.** If unset, the
|
||||
canvas Terminal hangs with `failed to open EIC tunnel` and the
|
||||
Files API EIC write path returns 500 — the EC2 boots fine, the
|
||||
symptom only shows when an operator opens the canvas Terminal tab.
|
||||
|
||||
## References
|
||||
|
||||
- [`molecule-controlplane/internal/provisioner/ec2.go`](https://git.moleculesai.app/molecule-ai/molecule-controlplane/blob/main/internal/provisioner/ec2.go) — provisioner source
|
||||
- [`molecule-controlplane/cmd/server/main.go`](https://git.moleculesai.app/molecule-ai/molecule-controlplane/blob/main/cmd/server/main.go) — env-var wiring
|
||||
- [`molecule-controlplane` README "Migration history"](https://git.moleculesai.app/molecule-ai/molecule-controlplane#migration-history) — canonical record
|
||||
- [AWS EC2 Instance Connect endpoints](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-connect-endpoint.html)
|
||||
- [AWS Secrets Manager](https://docs.aws.amazon.com/secretsmanager/latest/userguide/intro.html)
|
||||
- [SaaS file writes via EC2 Instance Connect](./saas-file-writes-eic.md) — EIC is also the Files API write channel
|
||||
- [Fly Machines provisioner (DEPRECATED)](./fly-machines-provisioner.md) — previous backend, retained for migration history
|
||||
Reference in New Issue
Block a user