v0.11.0
minorHighlights
- ▸Deploys won't fail because of us. New pre-flight gate validates every project, environment, source, region and GitHub repo before the pipeline kicks. Transient AWS errors auto-retry. Every terminal failure is classified into Platform or User and the dashboard shows actionable copy, never a generic red banner
- ▸Production approval gate. Per-environment toggle blocks deploys until N teammates approve. Self-approval blocked. The approval pane lives directly on the deployment detail page with audit trail
- ▸Image-build LLM autofix. When CodeBuild's BUILD phase fails on the user's Dockerfile, Eigon analyzes the log tail with Claude and surfaces a structured fix proposal (file, line, suggestion, reason, confidence) instead of a wall of stack trace
- ▸PR previews are real. The webhook used to just record rows in the database; now a 30-second reconciler picks them up and provisions a 'pr-{N}' env with the PR branch and commit. Profile=dev, Spot=on, smallest footprint
- ▸Multi-region traffic routing. Flag a project multi-region and Eigon keeps a Route53 latency-routed alias in sync across every region where you've deployed. New region's env goes live → record set picks it up automatically. Region's env goes away → record drops
- ▸Copilot is grounded. Ask 'why did my deploy fail?' and it answers from the actual classified root cause stored against the deployment, not from a guess. Live operational signals (active incidents, spend-cap pause, pending approvals) feed into every assistant prompt
- ▸Plans are strictly per-user. Owning N organisations means ONE subscription that covers all of them. Schema collapses three legacy rows per user to one and removes the duplicate org-scoped sub the entitlements engine had to pick a primary from
Added
- +Pre-flight gate (internal/pipelinegate) — sub-second checks for project linkage, BYOC verified, region resolution, GitHub repo + branch reachability with the saved token. Failures are typed (Code, Category PLATFORM_CONFIG | USER_INPUT, Hint) so the UI shows the right remediation
- +Retry harness (pkg/awsretry) — generic exponential backoff for transient AWS errors. Smithy throttling codes plus message fragments (i/o timeout, connection reset, rate exceeded) bounce; permission errors pass through unchanged. Wraps CodeBuild StartBuild and SSM PutParameter today
- +Deploy outcome classifier (internal/deployoutcome) — every pipeline failure tagged PLATFORM_TRANSIENT | PLATFORM_CONFIG | USER_CODE | USER_INPUT | UNKNOWN with a stable code and ready-to-show hint. Unknown errors land in platform-side so users never get blamed for our gaps. EmitMetric writes a structured log line CloudWatch scrapes into a platform-failure-ratio counter
- +Image-build retry + structured failure surface — CodeBuild PRE_BUILD failures retry once on transient codes; terminal failures get a classified deployment event with category, code, hint
- +Production approval policies — migration 0084 with deployment_approvals(deployment, approver, decision, comment) and UNIQUE(deployment, approver) so users can flip their vote by re-submitting. Gate enforced in /run returns 412 with 'deploy gated by approval policy: 0 / 2 approvals' until the threshold is met. Self-approval blocked
- +Image-build LLM autofix (internal/imagebuildautofix) — analyzes BuildFailedError.LogTail with the project context and produces a structured Proposal (root_cause, category, file, line, suggestion, reason, confidence) persisted as a deployment event
- +PR-preview reconciler (internal/prpreviews) — 30-second loop claims pending pr_previews rows via an atomic MarkBuilding, creates the ephemeral env with profile=dev and use_spot=true, attaches it to the preview row
- +Multi-region routing (internal/multiregion) — migration 0085 with projects.multi_region_enabled + multi_region_hostname + multi_region_zone_id and multi_region_routes(project, region, target_dns, target_zone). 5-minute reconciler computes desired (region → ALB DNS) from the latest deployed env per region and writes Route53 ChangeResourceRecordSets UPSERTs in latency-routing mode
- +Copilot grounding — gatherDeployments now appends the classified root cause from failure_analyses to every FAILED row; EnvConfig appends 'Operational signals' with the top 3 active incidents (last 24h), spend-cap pause state, and pending approval counts
- +ApprovalPanel + ApprovalPolicyCard frontend components, ApprovalPanel renders on every deployment detail page (auto-collapses when not required), ApprovalPolicyCard added to env settings alongside canary + spending cap
- +Env settings page at /dashboard/.../env/[envId]/settings — surfaces the canary config card (was orphaned in the codebase with no page rendering it), spending cap card, approval policy card. Backend canary watcher had been wired end-to-end for months but the toggle was unreachable in the UI
- +getUserIdFromToken() helper decodes the JWT uid claim so client components can compare viewer identity without a /users/me round-trip
- +Per-user plan model (migration 0083) — collapses each user's user_subscriptions rows to one (highest tier wins), drops UNIQUE(org_id), adds UNIQUE(user_id), nulls org_id. Self-heals so every org owner has a sub. Backend GetByOrg now resolves the org's owner and returns the owner's user-level sub
- +Approval API proxies, canary API proxies, env-policy API proxies — Next.js app-router 404'd these previously
Improved
- ~BYOC enforcement on spending caps — awsops.ScaleEnvironmentToMinimum now decorates context with the project's BYOC target before loading AWS config. Previously cap enforcement on BYOC environments hit Eigon's hosted account and silently failed; the cluster wasn't there so ListServices returned nothing
- ~ECS task role's SSM permission widened from us-east-1-only to all regions so ap-south-1 deploys (Mumbai is the default region for IN users) can write the DB password to SSM
- ~CodeBuild project removed from no-NAT private subnets — was the root cause of every aws ecr get-login-password 'exit status 1' failure (3-minute timeout disguised as a docker-login error)
- ~Nightly E2E smoke uses prebuilt prod Dockerfiles in CI via docker-compose.ci.yml override. The default dev compose runs `air` and compiles Go on a cold mod-cache, which exceeds the /healthz wait every time on a clean runner — the nightly had been red for a week
- ~Invite emails rewritten as plain transactional messages (no styled CTA button, no dark theme, no marketing copy) with Reply-To set to the inviter so Gmail lands them in Primary instead of Promotions. DKIM, SPF and DMARC already pass on eigon.io; this is the content-side fix
- ~/invites/{token} accept page added — emails linked there but the page 404'd. Includes login redirect with ?redirect preservation and accepts or declines with auto-route to /dashboard/org/{orgId}
- ~First-class audit: spend caps and BYOC end-to-end. New tests cover threshold ladder boundaries, de-dup logic, the BYOC context annotation itself via a fake TargetHook. Previously zero tests on spendingcaps + awsops packages
