fix(tutorials): update GitHub.com → Gitea URLs in EC2 provisioner tutorial
Post-suspension URL migration — all molecule-controlplane references
now point to git.moleculesai.app/molecule-ai/molecule-controlplane.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
state_transition_history=True)` was a per-agent opt-in.
- **v1**: the field is gone from `AgentCapabilities`. Per the SDK's own
`a2a/compat/v0_3/conversions.py`: *"No longer supported in v1.0"*.
The capability is now universal — `Task.history` is always available
and `tasks/get` accepts `historyLength` via `apply_history_length()`.
If you pass `state_transition_history=...` as a kwarg to
`AgentCapabilities` under v1, Pydantic will reject it. Drop the kwarg.
See [`workspace/main.py:215`](https://git.moleculesai.app/Molecule-AI/molecule-core/blob/main/workspace/main.py#L215)
for the explanatory comment that prevents future accidental re-adds.
## Common error shapes
When v0 code runs against the v1 SDK, the failure modes look like this:
| Error | Cause |
|---|---|
| `ModuleNotFoundError: No module named 'a2a.utils'` | v0 import path; module renamed to `a2a.helpers`. |
| `ImportError: cannot import name 'A2AStarletteApplication' from 'a2a.server.apps'` | The whole `a2a.server.apps` module is gone in v1. Switch to `a2a.server.routes` factories. |
| `ImportError: cannot import name 'TextPart' from 'a2a.types'` | Flattened `Part` API; use `Part(text=...)`. |
| `ValueError: Protocol message AgentCapabilities has no "state_transition_history" field` | Removed capability flag passed as kwarg; drop it. |
| `ValueError: Protocol message Part has no "root" field` | v0 `Part(root=TextPart(...))` shape against v1 schema; flatten to `Part(text=...)`. |
The protobuf-style `ValueError` messages always follow the pattern
`Protocol message <Type> has no "<field>" field` — that's the
fingerprint of "v0 shape against v1 schema." Treat it as a v0→v1 hint
even if the field name isn't on the cheat sheet above.
## Migration checklist
1. **Bump the dep** — `a2a-sdk[http-server]>=0.3.25` is the floor; remove
any `<1.0` upper bound. The Molecule wheel uses
`a2a-sdk[http-server]>=0.3.25` with no upper bound (see
- [PR #48 — feat(a2a): dual-compat for a2a-sdk 0.3.x and 1.x](https://git.moleculesai.app/Molecule-AI/molecule-ai-workspace-runtime/pulls/48) — runtime-side compat shim that keeps v0 peers working against the v1 wheel
- [Bring Your Own Runtime (MCP)](/docs/runtime-mcp) — universal wheel install path
- [External Agents](/docs/external-agents) — manual A2A path for non-MCP runtimes
title: "Provisioning Workspaces on AWS EC2 (production SaaS provisioner)"
description: "How the molecule-controlplane EC2 provisioner turns POST /cp/orgs and POST /workspaces calls into running tenant + workspace EC2 instances — env vars, lifecycle, tier sizing, and the migration off Fly Machines."
---
# Provisioning Workspaces on AWS EC2 (production SaaS provisioner)
As of April 2026, Molecule AI's SaaS control plane provisions both **tenants**
(per-org platform VMs) and **workspaces** (per-agent inference VMs) on
| `AWS_REGION` | `us-east-1` | Region for tenant + workspace launches |
| `EC2_AMI` | `ami-0ea3c35c5c3284d82` (Ubuntu 22.04 us-east-2) | Default AMI when no `thin_ami_pins` row matches |
| `EC2_VPC_ID` | — | VPC for per-tenant SG creation; falls back to `EC2_SECURITY_GROUP` if unset |
| `EC2_SUBNET_ID` | — | Subnet for `RunInstances` |
| `SECRETS_ENCRYPTION_KEY` | — | KMS-envelope DEK for tenant secret-at-rest; provisioner stays disabled until set |
### Required for production (#44 secure bootstrap)
| Var | Purpose |
|-----|---------|
| `EC2_TENANT_IAM_PROFILE` | Instance profile attached to every tenant EC2 so it can fetch its bootstrap bundle from Secrets Manager at boot. Without this set, `Provision` returns the error `"Secrets Manager + IAM instance profile are required (#113 — plaintext user-data path removed)"`. |
| `PROVISION_SHARED_SECRET` | Shared HMAC-secret stored alongside the tenant bootstrap bundle so workspace-server can authenticate inbound `/cp/...` callbacks |
| `CP_ADMIN_API_TOKEN` | Token the tenant uses to call admin endpoints back on the control plane |
| `CP_BASE_URL` | URL the tenant boot script uses to reach the control plane (typically `https://api.moleculesai.app`) |
### Required for the canvas Terminal tab
| Var | Purpose |
|-----|---------|
| `EIC_ENDPOINT_SG_ID` | Security-group ID of the region's [EC2 Instance Connect endpoint](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-connect-endpoint.html). The provisioner adds a `tcp/22` ingress rule to every per-tenant + per-workspace SG sourced from this SG, so the canvas Terminal can EIC-tunnel into the box for diagnostic ssh. Empty leaves the canvas Terminal broken with `failed to open EIC tunnel`. Discover with `aws ec2 describe-instance-connect-endpoints --region <region>`. |
| `MOLECULE_ENV` | `dev` / `staging` / `prod`; stamped on every EC2 tag and scopes the orphan-report's AWS lister so envs don't false-positive each other |
| `EC2_INSTANCE_TYPE` | Default `t3.small` for tenant VMs (workspaces use the per-tier table below) |
| `EC2_SECURITY_GROUP` | Fallback shared SG when `EC2_VPC_ID` is unset; production should leave this empty |
| `EC2_KEY_NAME` | Optional EC2 KeyPair name for emergency console SSH |
| `TENANT_IMAGE` | OCI ref for the tenant platform image (e.g. `ghcr.io/molecule-ai/platform-tenant:staging-<sha>`) |
| `CANARY_TENANT_IMAGE` | Override `TENANT_IMAGE` for orgs flagged `is_canary=true` |
| `CANARY_ROLE_ARN`, `CANARY_REGION`, `CANARY_VPC_ID`, `CANARY_SUBNET_ID` | Second-AWS-account target for canary tenant launches; all four required together |
| `TENANT_BACKUP_S3_PREFIX` | Empty disables nightly `pg_dump`; set `s3://bucket/path` to enable |
| `TENANT_BACKUP_REPORT_URL` | Defaults to `${CP_BASE_URL}/cp/tenants/backup-report` |
| `GHCR_PULL_TOKEN` | GHCR pull token written into the tenant bootstrap bundle (private images only) |
| T4 | `t3.large` | 80 GB | all production workspaces |
If you set a tier outside `[3, 4]` the clamp lifts it to T4 — a cheap
mis-provision rather than a fall-through to the unset `t3.small`
default. The clamp was added in PR #434 follow-up after `tier=5`
silently yielded `t3.small`.
Hermes overrides volume to 50 GB minimum regardless of tier.
## Lifecycle — stop, restart, redeploy, teardown
| Operation | Mechanism |
|-----------|-----------|
| **Stop / start a tenant** | `POST /cp/admin/tenants/:slug/{stop,start}` → `(*EC2).Stop` / `Start` via the EC2 API (no termination) |
| **Redeploy a tenant** (in-place new image) | `POST /cp/admin/tenants/:slug/redeploy` → SSM Run Command pulls the latest `TENANT_IMAGE` and recreates the platform container; never reboots EC2 |
| **Refresh workspace template images** | `POST /cp/admin/tenants/:slug/workspaces/redeploy` (single-tenant) or `POST /cp/admin/tenants/workspaces/redeploy-fleet` (canary-batched fleet); HTTP-only, no SSM |
| **Delete a workspace** | platform `DELETE /workspaces/:id` → CP `DeprovisionInstance(workspaceInstanceID, ...)` terminates the EC2 + cleans DNS + SG |
| **Delete a tenant (Art. 17 cascade)** | `DELETE /cp/orgs/:slug` → cascade-terminates all workspace EC2s tagged with this `OrgID`, then terminates the tenant EC2, then deletes the SG, Secrets Manager bundle, CF tunnel + CNAME |
| **Orphan recovery** | `tenant_resources` audit table + 30-min reconciler that diffs claims vs live AWS state and exposes orphan counts via `/cp/admin/stats` |
`DeprovisionInstance` polls termination under its own deadline so a
stuck shutdown surfaces as a deprovision failure (and the caller's
retry replays the cascade) instead of becoming a silent leak (#263).
## Why EC2 (vs Fly Machines)
The control plane has migrated infrastructure twice in April 2026 — both
- [SaaS file writes via EC2 Instance Connect](./saas-file-writes-eic.md) — EIC is also the Files API write channel
- [Fly Machines provisioner (DEPRECATED)](./fly-machines-provisioner.md) — previous backend, retained for migration history
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.