Compare commits

...

3 Commits

Author SHA1 Message Date
core-be b9cc572015 fix(handlers): add mutex protection to ssrf test-flag package vars
Runtime PR-Built Compatibility / detect-changes (pull_request) Waiting to run
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Waiting to run
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Blocked by required conditions
CI / Canvas Deploy Reminder (pull_request) Blocked by required conditions
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Waiting to run
sop-tier-check / tier-check (pull_request) Waiting to run
CI / all-required (pull_request) Blocked by required conditions
gate-check-v3 / gate-check (pull_request) Waiting to run
E2E API Smoke Test / detect-changes (pull_request) Waiting to run
E2E API Smoke Test / E2E API Smoke Test (pull_request) Blocked by required conditions
lint-required-no-paths / lint-required-no-paths (pull_request) Waiting to run
Handlers Postgres Integration / detect-changes (pull_request) Waiting to run
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Blocked by required conditions
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Waiting to run
audit-force-merge / audit (pull_request) Has been skipped
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 29s
CI / Canvas (Next.js) (pull_request) Failing after 21m42s
security-review / approved (pull_request) Successful in 43s
CI / Detect changes (pull_request) Successful in 1m27s
sop-checklist / all-items-acked (pull_request) Successful in 56s
CI / Platform (Go) (pull_request) Failing after 23m51s
Harness Replays / detect-changes (pull_request) Successful in 42s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 10s
CI / Python Lint & Test (pull_request) Successful in 14s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 3m47s
lint-mask-pr-atomicity / lint-mask-pr-atomicity (pull_request) Successful in 4m23s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 3m55s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 33s
Harness Replays / Harness Replays (pull_request) Successful in 19s
qa-review / approved (pull_request) Successful in 48s
Cherry-pick of hotfix/offsec-015-org-isolation commit 1d3d202f onto staging.

ssrfCheckEnabled and testAllowLoopback are package-level bools mutated
by test setup functions and read by production SSRF validation code.
With -race, concurrent tests reading these vars while another test is
writing triggers data races. Fix: add sync.RWMutex protection.

mc#race-fix.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-15 12:40:56 +00:00
core-be 39774795b6 ci(platform): raise test step timeout 40m → 60m for race-detector headroom
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 2m16s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 27s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 4m10s
lint-mask-pr-atomicity / lint-mask-pr-atomicity (pull_request) Successful in 4m12s
gate-check-v3 / gate-check (pull_request) Successful in 28s
CI / Detect changes (pull_request) Successful in 2m40s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Blocked by required conditions
qa-review / approved (pull_request) Successful in 20s
sop-tier-check / tier-check (pull_request) Successful in 40s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 3m36s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 3m55s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Waiting to run
security-review / approved (pull_request) Successful in 51s
sop-checklist / all-items-acked (pull_request) Successful in 46s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 28s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 2m2s
E2E API Smoke Test / detect-changes (pull_request) Successful in 2m29s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 2m14s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 1m53s
CI / Platform (Go) (pull_request) Failing after 13m35s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 18s
CI / Python Lint & Test (pull_request) Successful in 23s
CI / Canvas (Next.js) (pull_request) Failing after 19m16s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 16s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 13s
CI / all-required (pull_request) Failing after 17s
Cold runner observation: test suite with -race takes 20+ minutes vs ~14s
locally without -race. Raise all ceilings:

- golangci-lint: 20m → 30m
- Go-level timeout: 40m → 60m (active constraint)
- Step-level ceiling: 50m → 70m
- Job-level ceiling: 50m → 75m

mc#1099 follow-up.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-15 12:25:53 +00:00
core-be ee2ab7d749 infra(ci): apply full mc#1099 timeout fix to staging
CI / Canvas (Next.js) (pull_request) Waiting to run
CI / Canvas Deploy Reminder (pull_request) Blocked by required conditions
CI / all-required (pull_request) Blocked by required conditions
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 30s
sop-tier-check / tier-check (pull_request) Waiting to run
E2E API Smoke Test / detect-changes (pull_request) Successful in 2m1s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 1m59s
CI / Detect changes (pull_request) Successful in 2m18s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 22s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m38s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 2m43s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 1m3s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m58s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 32s
gate-check-v3 / gate-check (pull_request) Successful in 31s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 2m53s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 2m55s
lint-mask-pr-atomicity / lint-mask-pr-atomicity (pull_request) Successful in 3m10s
qa-review / approved (pull_request) Successful in 32s
security-review / approved (pull_request) Successful in 33s
sop-checklist / all-items-acked (pull_request) Successful in 49s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 15s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 11s
CI / Python Lint & Test (pull_request) Successful in 4s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 6s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 12s
CI / Platform (Go) (pull_request) Failing after 18m52s
Apply all cold-runner CI improvements from hotfix/offsec-015-org-isolation:
- Job-level timeout: 15m → 50m (mc#1099)
- golangci-lint: --timeout 3m → --no-config --timeout 10m (mc#1099)
- Diagnostic: 60s → 600s Go-level, step ceiling 20m (mc#1099)
- Test step: Go-level timeout 10m → 40m, step ceiling 15m → 50m (mc#1099)

Without these, the 10-minute Actions default step ceiling kills the test
step on cold runners before go test -timeout can fire.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-15 10:59:54 +00:00
3 changed files with 64 additions and 25 deletions
+26 -16
View File
@@ -49,7 +49,7 @@ on:
# `merge_group` (GitHub merge-queue trigger) dropped — Gitea has no merge
# queue. The .github/ original retains it; this Gitea-side copy drops it.
# Cancel in-progress CI runs when a new commit arrives on the same ref.
# Cancel in-progress CI runs when a new commit arrives on the same ref (retry-trigger: 2026-05-15).
# Stale runs queue up otherwise. PR refs and main/staging refs each get
# their own group because github.ref differs.
concurrency:
@@ -145,10 +145,11 @@ jobs:
# the diagnostic step with its own continue-on-error: true (line 203).
# Flip confirmed by CI / Platform (Go) status = success on main HEAD 363905d3.
continue-on-error: false
# Job-level ceiling. The go test step below runs with a per-step 10m timeout;
# this cap catches any step that leaks past that. Set well above 10m so
# the per-step timeout is the active constraint.
timeout-minutes: 15
# Job-level ceiling. The go test step below runs with a per-step 70m timeout;
# this cap catches any step that leaks past that. Set well above 70m so
# the per-step timeout is the active constraint. Raised to 75m
# to account for golangci-lint ~17m + test suite ~20-30m on cold runner (mc#1099).
timeout-minutes: 75
defaults:
run:
working-directory: workspace-server
@@ -172,16 +173,22 @@ jobs:
- if: always()
name: Install golangci-lint
run: go install github.com/golangci/golangci-lint/v2/cmd/golangci-lint@v2.12.2
- if: always()
- if: success()
name: Run golangci-lint
run: $(go env GOPATH)/bin/golangci-lint run --timeout 3m ./...
- if: always()
name: Diagnostic — per-package verbose 60s
# mc#1099: --no-config bypasses .golangci.yaml ceiling; --timeout 30m
# is the active constraint. Cold runner: fetch-depth:0 clone (5-10m) + Go
# toolchain (5-10m) + mod download (2-5m) + build + vet + install lint
# (5m) = ~15-20m before linting even starts. 30m gives headroom.
run: $(go env GOPATH)/bin/golangci-lint run --no-config --timeout 30m ./...
- if: success()
name: Diagnostic — per-package verbose 600s
# mc#1099: step-level ceiling above the 600s Go timeout for cold-runner headroom.
timeout-minutes: 20
run: |
set +e
go test -race -v -timeout 60s ./internal/handlers/... 2>&1 | tee /tmp/test-handlers.log
go test -race -v -timeout 600s ./internal/handlers/... 2>&1 | tee /tmp/test-handlers.log
handlers_exit=$?
go test -race -v -timeout 60s ./internal/pendinguploads/... 2>&1 | tee /tmp/test-pu.log
go test -race -v -timeout 600s ./internal/pendinguploads/... 2>&1 | tee /tmp/test-pu.log
pu_exit=$?
echo "::group::handlers exit=$handlers_exit (last 100 lines)"
tail -100 /tmp/test-handlers.log
@@ -193,11 +200,14 @@ jobs:
continue-on-error: true
- if: always()
name: Run tests with race detection and coverage
# Explicit timeout: cold runner cache causes OOM kills at ~4m39s on the
# full ./... suite with race detection + coverage. A 10m per-step timeout
# lets the suite complete on cold cache (~5-7m) while failing cleanly
# instead of OOM-killing. The job-level timeout (15m) is a backstop.
run: go test -race -timeout 10m -coverprofile=coverage.out ./...
# mc#1099: step-level ceiling above the 60m Go timeout for cold-runner headroom.
# Cold runner: golangci-lint ~17m + test suite ~20-30m = ~37-47m total.
# GitHub Actions default step ceiling is 10m — must override. Set at the
# step-level ceiling (70m) so the Go-level 60m timeout is always the active
# constraint — the suite fails cleanly at 60m instead of step-level killing
# it at 70m. Job-level (75m) is the backstop for the backstop.
timeout-minutes: 70
run: go test -race -timeout 60m -coverprofile=coverage.out ./...
- if: always()
name: Per-file coverage report
@@ -79,14 +79,18 @@ func newTestBroadcaster() *events.Broadcaster {
// for the duration of the test, so httptest.NewServer's loopback URLs
// don't trip the SSRF guard. The 169.254 metadata, RFC-1918, TEST-NET,
// CGNAT, and link-local guards stay active — only 127.0.0.0/8 and ::1
// are relaxed. Always paired with t.Cleanup to restore; multiple
// parallel tests won't race because Go test flips it sequentially per
// test unless t.Parallel() is used, and these tests don't parallelize.
// are relaxed. Protected by loopbackMu so concurrent tests don't race.
func allowLoopbackForTest(t *testing.T) {
t.Helper()
loopbackMu.Lock()
prev := testAllowLoopback
testAllowLoopback = true
t.Cleanup(func() { testAllowLoopback = prev })
t.Cleanup(func() {
loopbackMu.Lock()
defer loopbackMu.Unlock()
testAllowLoopback = prev
})
loopbackMu.Unlock()
}
// expectBudgetCheck adds the sqlmock expectation for the budget-check
+30 -5
View File
@@ -7,6 +7,7 @@ import (
"os"
"path/filepath"
"strings"
"sync"
)
// devModeAllowsLoopback reports whether the SSRF defence should permit
@@ -35,13 +36,20 @@ func devModeAllowsLoopback() bool {
// loopback URLs and fake hostnames (*.example) don't trigger SSRF
// rejections. Production code never mutates this.
var ssrfCheckEnabled = true
var ssrfMu sync.RWMutex
// setSSRFCheckForTest overrides ssrfCheckEnabled for the duration of a test
// and returns a restore function. Use with defer in *_test.go only.
func setSSRFCheckForTest(enabled bool) func() {
ssrfMu.Lock()
defer ssrfMu.Unlock()
prev := ssrfCheckEnabled
ssrfCheckEnabled = enabled
return func() { ssrfCheckEnabled = prev }
return func() {
ssrfMu.Lock()
defer ssrfMu.Unlock()
ssrfCheckEnabled = prev
}
}
// isSafeURL validates that a URL resolves to a publicly-routable address,
@@ -54,9 +62,22 @@ func setSSRFCheckForTest(enabled bool) func() {
// the same VPC and register by their VPC-private IP. Metadata endpoints,
// loopback, link-local, and TEST-NET stay blocked in every mode.
func isSafeURL(rawURL string) error {
if !ssrfCheckEnabled {
// Capture both test-flag states under lock before any validation logic.
// Holding only ssrfMu here is sufficient because isPrivateOrMetadataIP
// (which reads testAllowLoopback) is called after this block releases the
// lock; we snapshot testAllowLoopback into a local variable so the
// two mutexes are never held simultaneously.
ssrfMu.RLock()
enabled := ssrfCheckEnabled
ssrfMu.RUnlock()
if !enabled {
return nil
}
loopbackMu.RLock()
allowLoopback := testAllowLoopback
loopbackMu.RUnlock()
u, err := url.Parse(rawURL)
if err != nil {
return fmt.Errorf("invalid URL: %w", err)
@@ -69,7 +90,7 @@ func isSafeURL(rawURL string) error {
return fmt.Errorf("empty hostname")
}
if ip := net.ParseIP(host); ip != nil {
if (ip.IsLoopback() && !testAllowLoopback && !devModeAllowsLoopback()) || ip.IsUnspecified() || ip.IsLinkLocalUnicast() || ip.IsLinkLocalMulticast() || ip.IsInterfaceLocalMulticast() {
if (ip.IsLoopback() && !allowLoopback && !devModeAllowsLoopback()) || ip.IsUnspecified() || ip.IsLinkLocalUnicast() || ip.IsLinkLocalMulticast() || ip.IsInterfaceLocalMulticast() {
return fmt.Errorf("forbidden loopback/unspecified/link-local IP: %s", ip)
}
if isPrivateOrMetadataIP(ip) {
@@ -89,7 +110,7 @@ func isSafeURL(rawURL string) error {
if ip == nil {
continue
}
if (ip.IsLoopback() && !testAllowLoopback && !devModeAllowsLoopback()) || ip.IsUnspecified() || ip.IsLinkLocalUnicast() || ip.IsLinkLocalMulticast() || ip.IsInterfaceLocalMulticast() {
if (ip.IsLoopback() && !allowLoopback && !devModeAllowsLoopback()) || ip.IsUnspecified() || ip.IsLinkLocalUnicast() || ip.IsLinkLocalMulticast() || ip.IsInterfaceLocalMulticast() {
return fmt.Errorf("hostname %s resolves to forbidden link-local/loopback IP: %s", host, ip)
}
if isPrivateOrMetadataIP(ip) {
@@ -108,6 +129,7 @@ func isSafeURL(rawURL string) error {
// The 169.254 metadata, RFC-1918, TEST-NET, CGNAT, and link-local
// guards are NOT relaxed by this flag — only loopback.
var testAllowLoopback = false
var loopbackMu sync.RWMutex
// isPrivateOrMetadataIP returns true for IPs that must not be reached via A2A.
//
@@ -167,7 +189,10 @@ func isPrivateOrMetadataIP(ip net.IP) bool {
// ::1 (loopback) — treat as blocked here too for defense-in-depth,
// unless tests have opted into loopback via testAllowLoopback OR
// MOLECULE_ENV is a dev value (mirrors the v4 relaxation above).
if ip.IsLoopback() && !testAllowLoopback && !devModeAllowsLoopback() {
loopbackMu.RLock()
allowLB := testAllowLoopback
loopbackMu.RUnlock()
if ip.IsLoopback() && !allowLB && !devModeAllowsLoopback() {
return true
}
// Link-local fe80::/10 — always blocked.