ci(platform): add step-level timeout-minutes to diagnostic and test steps

mc#1099: GitHub Actions applies a DEFAULT 10-minute step ceiling
regardless of the job-level timeout. Without an explicit step-level
timeout, the "Run tests with race detection" step gets killed at 10m
even though go test -timeout 30m has not expired.

Fix: add timeout-minutes: 35 to the test step and timeout-minutes: 20
to the diagnostic step (which runs go test -timeout 600s / 10m).

Cold-runner observed timeline (before fix):
  golangci-lint --no-config --timeout 10m: ~10m (succeeds)
  go test -race -timeout 30m: starts at ~10m, killed at 10m step ceiling → FAIL

After fix:
  golangci-lint --no-config --timeout 10m: ~10m (succeeds)
  go test -race -timeout 30m: ~19m (completes within 35m step ceiling) 

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-15 10:03:25 +00:00
parent e5050154f0
commit e5a39c6d94
+7 -6
View File
@@ -180,6 +180,8 @@ jobs:
run: $(go env GOPATH)/bin/golangci-lint run --no-config --timeout 10m ./...
- if: success()
name: Diagnostic — per-package verbose 600s
# mc#1099: step-level ceiling above the 600s Go timeout for cold-runner headroom.
timeout-minutes: 20
run: |
set +e
go test -race -v -timeout 600s ./internal/handlers/... 2>&1 | tee /tmp/test-handlers.log
@@ -196,12 +198,11 @@ jobs:
continue-on-error: true
- if: always()
name: Run tests with race detection and coverage
# Explicit timeout: cold runner cache causes OOM kills at ~4m39s on the
# full ./... suite with race detection + coverage. A 30m per-step timeout
# lets the suite complete on cold cache (~19m observed) while failing
# cleanly instead of OOM-killing. The job-level timeout (50m) is a
# backstop. mc#1099: raised 10m → 20m → 30m. Cold runner:
# golangci-lint ~10m + test suite ~19m = ~29m total.
# mc#1099: step-level ceiling above the 30m Go timeout for cold-runner headroom.
# Cold runner: golangci-lint ~10m + test suite ~19m = ~29m total.
# GitHub Actions default step ceiling is 10m — must override or the step
# gets killed before the Go timeout fires. Job-level (50m) is the backstop.
timeout-minutes: 35
run: go test -race -timeout 30m -coverprofile=coverage.out ./...
- if: always()