[handover] Codify manual ECR promote operation as scripts/promote-tenant-image.sh #660

Closed
opened 2026-05-12 04:20:52 +00:00 by hongming · 0 comments
Owner

Context

Tonight (2026-05-12 03:30Z) we manually validated the prod-tenant image promotion end-to-end with Hongming's GO:

  1. Saved current :latest manifest as :latest-prev-20260512 (rollback tag)
  2. Re-tagged :staging-latest manifest as :latest via aws ecr put-image
  3. Triggered CP redeploy on chloe-dong + hongming via /cp/admin/tenants/<slug>/redeploy
  4. Verified /buildinfo shows new git_sha (210da3b1 vs prior 0276b29) + /health=ok

All done as ad-hoc bash. Not codified.

Asks

  1. Write scripts/promote-tenant-image.sh <src-tag> <dst-tag> in operator-config repo with:
    • Preflight: confirm both src+dst exist in ECR; refuse if src digest == dst digest (no-op)
    • Save current dst manifest as <dst>-prev-<YYYYMMDD> for rollback
    • put-image with src manifest tagged as dst
    • Print before/after digests for audit
  2. Write companion scripts/redeploy-tenant.sh <slug> [target_tag] with:
    • Optional pre-flight docker login refresh via SSM (avoids 12h-stale-auth failure — see sibling issue on amazon-ecr-credential-helper)
    • Call CP /cp/admin/tenants/<slug>/redeploy
    • Poll /buildinfo until git_sha changes (or timeout 60s)
    • Verify /health=ok post-deploy
  3. Document retention policy for :<tag>-prev-<date> rollback tags (e.g., keep last 3, purge older)

Rollback tag still in ECR

Tonight's specific rollback tag: molecule-ai/platform-tenant:latest-prev-20260512 (sha256:e08c68090b9a...) — preserved for rollback if needed. NOT YET on retention schedule.

Related

  • CP#135 RFC#279 (auto-update chain) — when that lands fully, manual promote becomes the rollback path only
## Context Tonight (2026-05-12 03:30Z) we manually validated the prod-tenant image promotion end-to-end with Hongming's GO: 1. Saved current `:latest` manifest as `:latest-prev-20260512` (rollback tag) 2. Re-tagged `:staging-latest` manifest as `:latest` via `aws ecr put-image` 3. Triggered CP redeploy on chloe-dong + hongming via `/cp/admin/tenants/<slug>/redeploy` 4. Verified `/buildinfo` shows new git_sha (210da3b1 vs prior 0276b29) + `/health=ok` All done as ad-hoc bash. Not codified. ## Asks 1. Write `scripts/promote-tenant-image.sh <src-tag> <dst-tag>` in operator-config repo with: - Preflight: confirm both src+dst exist in ECR; refuse if src digest == dst digest (no-op) - Save current dst manifest as `<dst>-prev-<YYYYMMDD>` for rollback - put-image with src manifest tagged as dst - Print before/after digests for audit 2. Write companion `scripts/redeploy-tenant.sh <slug> [target_tag]` with: - Optional pre-flight `docker login` refresh via SSM (avoids 12h-stale-auth failure — see sibling issue on amazon-ecr-credential-helper) - Call CP `/cp/admin/tenants/<slug>/redeploy` - Poll `/buildinfo` until git_sha changes (or timeout 60s) - Verify `/health=ok` post-deploy 3. Document retention policy for `:<tag>-prev-<date>` rollback tags (e.g., keep last 3, purge older) ## Rollback tag still in ECR Tonight's specific rollback tag: `molecule-ai/platform-tenant:latest-prev-20260512` (sha256:e08c68090b9a...) — preserved for rollback if needed. NOT YET on retention schedule. ## Related - CP#135 RFC#279 (auto-update chain) — when that lands fully, manual promote becomes the rollback path only
triage-operator added the tier:medium label 2026-05-12 04:21:35 +00:00
core-devops was assigned by hongming 2026-05-12 04:25:23 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#660