fix(uploads): bump poll-mode size_bytes CHECK to 100MB to match push-mode (mc#1588) #1589

Open
core-be wants to merge 1 commits from fix/poll-mode-pending-uploads-100mb-mc1588 into main
Member

Summary

CTO directive 2026-05-19 task #295: poll-mode (laptop-runtime workspaces) had a 25 MB CHECK constraint on pending_uploads.size_bytes that diverged from the push-mode 100 MB cap mc#1588 is landing for SaaS EC2 tenants (reno-stars forensic a99ab0a1). This PR closes the poll-mode gap so external runtime workspaces pulling chat-attached files via the queue accept the same per-file ceiling as push-mode.

  • NEW migration pair 20260519200000_pending_uploads_bump_size_cap.{up,down}.sql (104857600 ↔ 26214400, idempotent DROP IF EXISTS / re-add)
  • pendinguploads.MaxFileBytes 25→100 MB (mirrors the DB CHECK; pre-DB guard)
  • workspace/inbox_uploads.py MAX_FILE_BYTES 25→100 MB, DEFAULT_FETCH_TIMEOUT 60→240 s (laptop pull side; the longer timeout matches the new transfer ceiling and avoids the wrong-reason timeout failure mc#1588 fixed canvas-side for a99ab0a1)
  • New integration tests pinning both the pre-DB guard AND the raw DB CHECK (catches a future schema-rename that silently no-ops the DROP IF EXISTS in the next bump migration)
  • Two existing handler-level tests gain a structural-precondition t.Skip because post-bump per_file_cap == body_cap (chatUploadMaxBytes), so the body MaxBytesReader 400s before the per-file 413 branch is reachable in single-file uploads. Storage-level atomicity remains pinned by the integration test against real Postgres.

Coupling with mc#1588

mc#1588 is open, not yet merged. This PR touches a disjoint file set from mc#1588 — no merge conflict expected. Suggested merge order: mc#1588 first (it owns chatUploadMaxBytes + canvas + nginx), then this PR. If this PR lands first, the poll-mode DB CHECK is at 100 MB while the body cap is still at 50 MB — clients hit the body cap (400) before the per-file cap (413), so user-visible behavior is unchanged; landing both in either order is safe.

SSOT note

After this PR + mc#1588 the 100 MB constant lives in FIVE mirror sites: canvas TS + workspace-server Go (chat_files.go body cap via mc#1588 + storage.go per-file cap via this PR) + workspace Python ingest (mc#1588) + workspace Python pull (this PR) + nginx harness (mc#1588) + this DB CHECK. The proper fix is GET /uploads/limits so each surface reads a single source — RFC filed by a0d62036, referenced in mc#1588's description. Per feedback_no_single_source_of_truth, every cap change between now and that RFC landing must touch all five surfaces in lockstep.

Test plan

  • go test ./workspace-server/... -count=1 -short — all 30+ packages green
  • python3 -m pytest workspace/tests/test_inbox_uploads.py --no-cov — 81/81 passing
  • go build ./... — clean compile
  • CI: handlers-postgres-integration.yml runs the new integration tests against real Postgres (gated on workspace-server/migrations/** change → triggers automatically)
  • Post-merge prod read-back: SELECT pg_get_constraintdef(c.oid) FROM pg_constraint c JOIN pg_class t ON t.oid=c.conrelid WHERE t.relname='pending_uploads' AND c.conname='pending_uploads_size_bytes_check' returns CHECK ((size_bytes > 0) AND (size_bytes <= 104857600))

Reviewers

Per feedback_route_approvals_to_team_personas_not_orchestrator_sub_agents:

  • core-devops — migration / CI lens (forward-only ALTER, ACCESS EXCLUSIVE lock posture)
  • core-qa — test discipline lens (the two new skips have a re-enable condition; the new integration tests pin DB-level invariants beyond what unit tests catch)

References

  • mc#1588 — push-mode bump (canvas + workspace-server body cap + workspace Python ingest + nginx)
  • task #295 — internal tracker; CTO-authorized this work
  • forensic a99ab0a1 — reno-stars 2026-05-19, root failure motivating mc#1588 + this follow-up
  • feedback_no_single_source_of_truth — the SSOT discipline this bump duplicates against, pending the /uploads/limits fix

Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com

## Summary CTO directive 2026-05-19 task #295: poll-mode (laptop-runtime workspaces) had a 25 MB CHECK constraint on `pending_uploads.size_bytes` that diverged from the push-mode 100 MB cap mc#1588 is landing for SaaS EC2 tenants (reno-stars forensic a99ab0a1). This PR closes the poll-mode gap so external runtime workspaces pulling chat-attached files via the queue accept the same per-file ceiling as push-mode. - NEW migration pair `20260519200000_pending_uploads_bump_size_cap.{up,down}.sql` (104857600 ↔ 26214400, idempotent DROP IF EXISTS / re-add) - `pendinguploads.MaxFileBytes` 25→100 MB (mirrors the DB CHECK; pre-DB guard) - `workspace/inbox_uploads.py` MAX_FILE_BYTES 25→100 MB, DEFAULT_FETCH_TIMEOUT 60→240 s (laptop pull side; the longer timeout matches the new transfer ceiling and avoids the wrong-reason timeout failure mc#1588 fixed canvas-side for a99ab0a1) - New integration tests pinning both the pre-DB guard AND the raw DB CHECK (catches a future schema-rename that silently no-ops the DROP IF EXISTS in the next bump migration) - Two existing handler-level tests gain a structural-precondition `t.Skip` because post-bump `per_file_cap == body_cap` (chatUploadMaxBytes), so the body MaxBytesReader 400s before the per-file 413 branch is reachable in single-file uploads. Storage-level atomicity remains pinned by the integration test against real Postgres. ## Coupling with mc#1588 mc#1588 is open, not yet merged. This PR touches a disjoint file set from mc#1588 — no merge conflict expected. Suggested merge order: mc#1588 first (it owns chatUploadMaxBytes + canvas + nginx), then this PR. If this PR lands first, the poll-mode DB CHECK is at 100 MB while the body cap is still at 50 MB — clients hit the body cap (400) before the per-file cap (413), so user-visible behavior is unchanged; landing both in either order is safe. ## SSOT note After this PR + mc#1588 the 100 MB constant lives in FIVE mirror sites: canvas TS + workspace-server Go (chat_files.go body cap via mc#1588 + storage.go per-file cap via this PR) + workspace Python ingest (mc#1588) + workspace Python pull (this PR) + nginx harness (mc#1588) + this DB CHECK. The proper fix is `GET /uploads/limits` so each surface reads a single source — RFC filed by a0d62036, referenced in mc#1588's description. Per `feedback_no_single_source_of_truth`, every cap change between now and that RFC landing must touch all five surfaces in lockstep. ## Test plan - [x] `go test ./workspace-server/... -count=1 -short` — all 30+ packages green - [x] `python3 -m pytest workspace/tests/test_inbox_uploads.py --no-cov` — 81/81 passing - [x] `go build ./...` — clean compile - [ ] CI: handlers-postgres-integration.yml runs the new integration tests against real Postgres (gated on `workspace-server/migrations/**` change → triggers automatically) - [ ] Post-merge prod read-back: `SELECT pg_get_constraintdef(c.oid) FROM pg_constraint c JOIN pg_class t ON t.oid=c.conrelid WHERE t.relname='pending_uploads' AND c.conname='pending_uploads_size_bytes_check'` returns `CHECK ((size_bytes > 0) AND (size_bytes <= 104857600))` ## Reviewers Per `feedback_route_approvals_to_team_personas_not_orchestrator_sub_agents`: - `core-devops` — migration / CI lens (forward-only ALTER, ACCESS EXCLUSIVE lock posture) - `core-qa` — test discipline lens (the two new skips have a re-enable condition; the new integration tests pin DB-level invariants beyond what unit tests catch) ## References - mc#1588 — push-mode bump (canvas + workspace-server body cap + workspace Python ingest + nginx) - task #295 — internal tracker; CTO-authorized this work - forensic a99ab0a1 — reno-stars 2026-05-19, root failure motivating mc#1588 + this follow-up - `feedback_no_single_source_of_truth` — the SSOT discipline this bump duplicates against, pending the `/uploads/limits` fix Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
core-be added 1 commit 2026-05-20 03:43:44 +00:00
fix(uploads): bump poll-mode size_bytes CHECK to 100MB to match push-mode (mc#1588)
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 14s
Check migration collisions / Migration version collision check (pull_request) Successful in 14s
CI / Detect changes (pull_request) Successful in 6s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 16s
E2E API Smoke Test / detect-changes (pull_request) Successful in 8s
CI / Platform (Go) (pull_request) Successful in 4m41s
E2E Chat / detect-changes (pull_request) Successful in 18s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 5s
Harness Replays / detect-changes (pull_request) Successful in 4s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 5s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m10s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 13s
Lint no tenant GITEA/GITHUB token write / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 5s
publish-runtime-autobump / bump-and-tag (pull_request) Has been skipped
publish-runtime-autobump / pr-validate (pull_request) Successful in 39s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 5s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 9s
qa-review / approved (pull_request) Failing after 5s
gate-check-v3 / gate-check (pull_request) Successful in 6s
security-review / approved (pull_request) Failing after 4s
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request) Successful in 3s
sop-checklist / review-refire (pull_request) Has been skipped
sop-tier-check / tier-check (pull_request) Successful in 4s
CI / Canvas (Next.js) (pull_request) Successful in 5m40s
CI / Python Lint & Test (pull_request) Successful in 6m44s
CI / all-required (pull_request) Successful in 5m23s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Failing after 1m30s
Harness Replays / Harness Replays (pull_request) Successful in 10s
E2E Staging External Runtime / E2E Staging External Runtime (pull_request) Successful in 5m46s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 1m33s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 2m0s
E2E Chat / E2E Chat (pull_request) Failing after 5m30s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 7m24s
eeaffa275b
CTO directive 2026-05-19 task #295: poll-mode (laptop-runtime workspaces)
had a 25 MB CHECK constraint on `pending_uploads.size_bytes` that
diverged from the push-mode 100 MB cap mc#1588 is landing for the
SaaS EC2 tenants (reno-stars forensic a99ab0a1). This PR closes the
poll-mode gap so an external runtime workspace pulling chat-attached
files via the queue accepts the same per-file ceiling as a push-mode
tenant.

Surfaces touched (mirror sites for the per-file cap):

  1. workspace-server/migrations/20260519200000_pending_uploads_bump_size_cap.{up,down}.sql
     — NEW pair. DROP IF EXISTS the auto-named
     `pending_uploads_size_bytes_check` constraint and re-add at
     104857600. DOWN restores 26214400 (operator must drain any
     25–100 MB rows first; documented in down-migration header).

  2. workspace-server/internal/pendinguploads/storage.go
     — MaxFileBytes 25→100 MB. Pre-DB guard so an oversize Put
     short-circuits before round-tripping Postgres.

  3. workspace-server/internal/handlers/chat_files.go (comment only)
     — header comment on uploadPollMode updated from "Per-file cap:
     25 MB" to "100 MB" with the cross-reference to mc#1588.

  4. workspace/inbox_uploads.py (workspace-side puller for poll-mode)
     — MAX_FILE_BYTES 25→100 MB. DEFAULT_FETCH_TIMEOUT 60→240 s so a
     legitimate 100 MB transfer over a ~5 Mbps consumer link
     (~160 s wire time) doesn't hit the wrong-reason timeout failure
     the canvas side of mc#1588 fixed for forensic a99ab0a1.

  5. workspace-server/internal/handlers/pending_uploads_integration_test.go
     — TestIntegration_PendingUploads_SizeCap_100MB pins both the
     pre-DB guard (Put returns ErrTooLarge at MaxFileBytes+1) AND
     the raw DB CHECK (direct INSERT at 100MB+1 returns SQLSTATE
     23514). Skips in -short mode (allocates ~100 MB).
     TestIntegration_PendingUploads_SizeCap_DBConstraintName pins
     the constraint name + clause via pg_get_constraintdef so a
     future schema-rename can't silently regress to 25 MB through
     a no-op DROP IF EXISTS.

  6. workspace-server/internal/handlers/chat_files_poll_test.go
     — TestPollUpload_PerFileCapPreStorage_413 and
     TestPollUpload_AtomicRollbackOnSecondFileTooLarge gain a
     structural-precondition skip: when MaxFileBytes >=
     chatUploadMaxBytes (which is the post-mc#1588 + this PR state
     for poll-mode: per-file == body == 100 MB), the body
     MaxBytesReader 400s before the per-file 413 branch is
     reachable. Atomicity is still pinned by the integration test
     against real Postgres. Re-enable when chatUploadMaxBytes is
     raised above the per-file cap (RFC for GET /uploads/limits
     follow-up will reshape this layering).

SSOT note (per feedback_no_single_source_of_truth):

  After this PR the 100 MB constant lives in FIVE mirror sites —
  canvas TS (mc#1588) + workspace-server Go (chat_files.go via
  mc#1588 + this PR's storage.go) + workspace Python ingest
  (internal_chat_uploads.py via mc#1588) + workspace Python pull
  (inbox_uploads.py via this PR) + nginx harness mirror (mc#1588)
  + this DB CHECK. The proper fix is the GET /uploads/limits
  endpoint per CTO follow-up (RFC filed by a0d62036, referenced
  in mc#1588 description). Until that lands, every cap change
  needs a coupled bump across all five surfaces.

Out of scope:

  - mc#1588's push-mode bumps (chatUploadMaxBytes, the Python
    ingest CHAT_UPLOAD_MAX_FILE_BYTES, canvas MAX_UPLOAD_BYTES,
    nginx client_max_body_size). Those land via mc#1588; this PR
    touches a disjoint file set.

  - The GET /uploads/limits SSOT endpoint. Separate RFC.

References:
  - mc#1588 (push-mode 50→100 MB; canvas + workspace-server +
    workspace Python + nginx harness)
  - task #295 (internal tracker; CTO-authorized)
  - forensic a99ab0a1 (reno-stars 2026-05-19; the root failure
    motivating mc#1588 and this follow-up)
  - feedback_no_single_source_of_truth (the SSOT discipline this
    bump duplicates against, pending the /uploads/limits fix)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Some optional checks failed
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 14s
Check migration collisions / Migration version collision check (pull_request) Successful in 14s
CI / Detect changes (pull_request) Successful in 6s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 16s
E2E API Smoke Test / detect-changes (pull_request) Successful in 8s
CI / Platform (Go) (pull_request) Successful in 4m41s
E2E Chat / detect-changes (pull_request) Successful in 18s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 5s
Harness Replays / detect-changes (pull_request) Successful in 4s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 5s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m10s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 13s
Lint no tenant GITEA/GITHUB token write / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 5s
publish-runtime-autobump / bump-and-tag (pull_request) Has been skipped
publish-runtime-autobump / pr-validate (pull_request) Successful in 39s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 5s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 9s
qa-review / approved (pull_request) Failing after 5s
gate-check-v3 / gate-check (pull_request) Successful in 6s
security-review / approved (pull_request) Failing after 4s
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request) Successful in 3s
sop-checklist / review-refire (pull_request) Has been skipped
sop-tier-check / tier-check (pull_request) Successful in 4s
CI / Canvas (Next.js) (pull_request) Successful in 5m40s
CI / Python Lint & Test (pull_request) Successful in 6m44s
CI / all-required (pull_request) Successful in 5m23s
Required
Details
E2E API Smoke Test / E2E API Smoke Test (pull_request) Failing after 1m30s
Harness Replays / Harness Replays (pull_request) Successful in 10s
E2E Staging External Runtime / E2E Staging External Runtime (pull_request) Successful in 5m46s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 1m33s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 2m0s
E2E Chat / E2E Chat (pull_request) Failing after 5m30s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 7m24s
This pull request has changes conflicting with the target branch.
  • workspace-server/internal/handlers/chat_files.go
View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u origin fix/poll-mode-pending-uploads-100mb-mc1588:fix/poll-mode-pending-uploads-100mb-mc1588
git checkout fix/poll-mode-pending-uploads-100mb-mc1588
Sign in to join this conversation.
No Reviewers
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#1589