VPC Deployment

Run EvalGuard inside an isolated AWS VPC, GCP custom-mode network, or Azure VNet. Optionally publish via AWS PrivateLink, GCP Private Service Connect, or Azure Private Link Service so customers reach EvalGuard without traversing the public internet.

This guide complements the general self-hosting guide. The Terraform here provisions the network layer only — the compute layer (EKS / GKE / EC2 + Helm chart) sits on top.

Architecture

The VPC isolates EvalGuard's web tier, worker, Postgres, and Redis in private subnets. Outbound LLM provider calls egress through a NAT gateway (AWS) or Cloud NAT (GCP) — they never leak the customer's source IP. Inbound traffic terminates at a load balancer in the public subnet.

        ┌──────────────────────────────────────────────────────────┐
        │                    EvalGuard VPC                         │
        │  ┌─────────────────────┐    ┌───────────────────────┐    │
        │  │  Public subnet      │    │  Private subnet       │    │
        │  │                     │    │                       │    │
        │  │  ALB / NLB ─────────┼────► Web (Next.js)         │    │
        │  │  NAT Gateway ◄──────┼────  Worker (BullMQ)       │    │
        │  │                     │    │  Postgres / Redis     │    │
        │  └─────────────────────┘    └───────────────────────┘    │
        │           │                            │                 │
        │           ▼                            ▼                 │
        │  Customer (HTTPS)            OpenAI / Anthropic / etc    │
        │  or PrivateLink                  (outbound only)         │
        └──────────────────────────────────────────────────────────┘

Terraform module

Two production-ready modules live at infra/terraform/vpc/ in the public repo. Both are pinned to specific provider versions and deterministic.

AWS

bash
cd infra/terraform/vpc/aws
terraform init
terraform plan \
  -var "region=us-east-1" \
  -var "deployment_name=evalguard-prod" \
  -var "nat_gateway_count=2"
terraform apply

Outputs vpc_id, private_subnet_ids, web_security_group_id — feed these into your compute-layer module.

GCP

bash
cd infra/terraform/vpc/gcp
terraform init
terraform plan \
  -var "project_id=YOUR_PROJECT" \
  -var "region=us-central1"
terraform apply

Azure

bash
cd infra/terraform/vpc/azure
terraform init
terraform plan \
  -var "subscription_id=YOUR_SUB_ID" \
  -var "location=eastus"
terraform apply

5 subnets pre-provisioned (web / worker / data / ingress / pls). The data subnet is delegated toMicrosoft.DBforPostgreSQL/flexibleServers so you can drop a flexible-server straight in. NAT Gateway is zone-redundant by default; no separate HA flag.

For Private Link Service: terraform apply -var enable_private_link=true -var 'private_link_subscription_ids=["CUSTOMER_SUB_ID"]'. Output private_link_alias is what the customer plugs into a Private Endpoint in their VNet.

GKE-ready secondary ranges (pods_cidr, services_cidr) are pre-provisioned — pass the named ranges to your google_container_cluster.ip_allocation_policy.

Private endpoint (PrivateLink / PSC)

For enterprise customers that require zero public-internet exposure, publish EvalGuard as a private endpoint. Customer VPCs/projects then connect via AWS PrivateLink or GCP Private Service Connect — traffic flows entirely over the cloud provider's backbone.

AWS PrivateLink

bash
terraform apply \
  -var "enable_privatelink=true" \
  -var 'allowed_principals=["arn:aws:iam::CUSTOMER_ACCOUNT_ID:root"]'

Output privatelink_service_name is the value the customer needs. They create an Interface VPC Endpoint in their VPC pointing at it; we auto-accept connections from any principal in allowed_principals.

GCP Private Service Connect

bash
terraform apply \
  -var "enable_psc=true" \
  -var 'psc_consumer_projects=["customer-gcp-project-id"]'

The customer creates a Service Connect endpoint in their project targeting our published psc_attachment_name. Connection limit defaults to 10 per consumer project — adjust in the module if needed.

Security review checklist

Run through this before you hand the deployment to procurement.

  • CIDR overlap — chosen vpc_cidr / subnet_cidr must not overlap with peer VPCs you'll later peer or transit-gateway with.
  • NAT redundancy — single NAT in dev is fine; production must run nat_gateway_count = 2 or higher (one per AZ).
  • Flow logs retention — defaults to 365 days (AWS) / continuous (GCP). Match your compliance requirement and cost ceiling.
  • Allowed principals allowed_principals / psc_consumer_projects must be an explicit allow-list. Never ["*"].
  • Remote state — back the Terraform state with S3+DynamoDB lock (AWS) or GCS (GCP). Local state is unsafe for any deployment a second operator might touch.
  • Tagging discipline — every resource is tagged managed-by=terraform; add your own cost-center, owner, environment tags via the tags / labels variable.

Cost estimate

Rough monthly run-rate, network layer only. Compute (EKS/GKE nodes), Postgres, Redis are extra and depend on your sizing.

ComponentAWSGCP
NAT (1×)~$45 ($0.045/h + data)~$45 (Cloud NAT egress-priced)
NAT (2× HA)~$90~$45 (single regional NAT)
Flow logs (1 TB/mo)~$5 (CloudWatch ingest)~$3 (Logging)
PrivateLink / PSC$0.01/h + $0.01/GB$0.01/GB egress
VPC + subnets + IGW + ALB SG$0$0
Total (single-NAT, no PrivateLink)~$50/mo~$48/mo

Troubleshooting

Outbound LLM calls timing out

Check the route table for the private subnet — 0.0.0.0/0 must point at the NAT gateway, not the internet gateway. AWS console: VPC → Route Tables → {deployment_name}-rt-private-{az}.

PrivateLink endpoint stuck in "Pending"

The customer's account ARN must exactly match a value in allowed_principals. Use arn:aws:iam::ACCOUNT_ID:root to allow the entire account, or a specific role ARN to scope tighter. After updating Terraform, the existing pending request is auto-accepted on next apply.

GKE pods can't reach Postgres

Verify the secondary range names passed to cluster_secondary_range_name match the module outputs ( {deployment_name}-pods, {deployment_name}-services). Mismatched names produce silent connectivity failures because pods end up on a default range with no firewall rules.

Next steps

Source: infra/terraform/vpc/

Audit the modules + README before applying. Issues welcome on the public repo.

View on GitHub
Documentation | EvalGuard