Deploys the componentized LiteLLM proxy on AWS:
S3_BUCKET_NAME / S3_REGION_NAME for cache backend, request log archival, and /v1/files storageLITELLM_MASTER_KEY (auto-generated, sk-…) and the Aurora master password (bootstrap-only)gateway, backend, ui/v1/chat/*, /v1/embeddings, …) → gateway/, /_next/*, /litellm-asset-prefix/*, …) → ui/key/*, /user/*, …) → backendlitellm-migrations) that runs prisma migrate deploy from the dedicated ghcr.io/berriai/litellm-migrations imageThe cluster runs with iam_database_authentication_enabled = true. Enabling
that on the cluster doesn’t by itself let any Postgres user log in with an IAM
token — you also need to CREATE USER ... GRANT rds_iam once. bootstrap.tf
does this automatically during terraform apply via a one-shot Fargate task
(postgres:16-alpine running the bootstrap SQL with the master password from
Secrets Manager). The SQL is idempotent, so re-applies are safe.
The same apply also runs the prisma schema migration via the existing
litellm-migrations task definition, and the gateway/backend services
depends_on the migration so they don’t start until the schema is in place.
At runtime, the proxy assembles DATABASE_URL from DATABASE_HOST/PORT/USER/NAME
plus a short-lived IAM token — see litellm/proxy/auth/rds_iam_token.py. The
task role has rds-db:connect scoped to the IAM-authed user on the cluster.
Break-glass. If you need to run the bootstrap or migration by hand (e.g.,
to re-apply against an externally provisioned cluster), db_bootstrap_sql and
migration_run_command are still exposed as outputs.
Prerequisite. terraform apply shells out to aws ecs run-task /
aws ecs wait in local-exec provisioners, so the machine running terraform
needs the aws CLI installed and authenticated.
proxy_config (preferred)Mirrors the helm chart’s gateway.config.proxy_config. The map is YAML-encoded
and uploaded to S3 (config/litellm-config.yaml in the stack’s bucket); the
gateway and backend container entrypoints download it to
/tmp/litellm-config.yaml at task start via boto3 and set CONFIG_FILE_PATH
to match. The S3 object’s etag is wired into the task definition, so editing
proxy_config produces a new task-def revision and a rolling redeploy of both
services.
proxy_config = {
model_list = [
{
model_name = "gpt-4o"
litellm_params = {
model = "openai/gpt-4o"
api_key = "os.environ/OPENAI_API_KEY"
}
},
]
general_settings = {
master_key = "os.environ/LITELLM_MASTER_KEY"
database_url = "os.environ/DATABASE_URL"
}
}
LiteLLM resolves os.environ/<NAME> references in the YAML against the
container’s environment. That means provider API keys belong in
*_extra_secrets (next section), and your YAML just references them by name.
Non-sensitive plaintext (feature flags, observability hosts, etc.):
gateway_extra_env = {
LANGFUSE_HOST = "https://us.cloud.langfuse.com"
}
backend_extra_env = {
STORE_MODEL_IN_DB = "True"
}
Sensitive values — provider API keys, third-party tokens — live in existing Secrets Manager secrets. Reference them by ARN:
gateway_extra_secrets = {
OPENAI_API_KEY = "arn:aws:secretsmanager:us-west-2:111122223333:secret:openai-api-key-AbCdEf"
ANTHROPIC_API_KEY = "arn:aws:secretsmanager:us-west-2:111122223333:secret:anthropic-api-key-GhIjKl"
}
What happens under the hood:
secretsmanager:GetSecretValue on every ARN
listed here.proxy_config YAML references the resulting env var via
os.environ/OPENAI_API_KEY.To pluck a single field out of a JSON secret, use ECS’s :fieldName:: suffix:
gateway_extra_secrets = {
OPENAI_API_KEY = "arn:…:secret:provider-keys-AbCdEf:openai_api_key::"
}
To create the secret beforehand:
aws secretsmanager create-secret \
--name openai-api-key \
--secret-string "sk-proj-..."
OTel v2 (https://docs.litellm.ai/docs/observability/opentelemetry_v2) is
opt-in and gated entirely on otel_endpoint. Empty (default) and nothing
OTel-related is added to the container env. Set it and both gateway and
backend gain LITELLM_OTEL_V2=true plus the OTEL_* block, with
OTEL_SERVICE_NAME stamped per component (${tenant}-litellm-${env}-gateway
and -backend) so spans land tagged with the right hop. Any OTEL_* key
set in gateway_extra_env / backend_extra_env overrides the default for
that service.
otel_endpoint = "http://otel-collector.internal:4318"
otel_exporter = "otlp_http" # otlp_grpc, console
otel_environment_name = "prod" # defaults to var.env
For collectors that require an auth header, store the comma-separated
key=value string in Secrets Manager and reference it via
otel_headers_secret_arn. The execution role auto-gains
secretsmanager:GetSecretValue on that ARN.
otel_headers_secret_arn = "arn:aws:secretsmanager:us-west-2:111122223333:secret:honeycomb-otel-headers-AbCdEf"
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT defaults to
no_content; flip otel_capture_message_content = "prompt_and_completion"
only after auditing what lands in the backend, since prompts and
completions are typically sensitive.
Vendor presets (Arize, Phoenix, Langfuse OTel, Weave, Langtrace, Levo,
AgentOps) live under proxy_config.litellm_settings.callbacks and are
orthogonal to the OTLP variables above; their credentials still go in
*_extra_secrets.
Every resource the stack creates is named ${tenant}-litellm-${env} (or
that plus a per-resource suffix), so multiple tenants and multiple
environments coexist in the same account as long as the (tenant, env)
pair differs:
tenant |
env |
Example resource name |
|---|---|---|
acme |
stage |
acme-litellm-stage-gateway |
acme |
prod |
acme-litellm-prod-master-key |
globex |
dev |
globex-litellm-dev-license |
For a per-tenant instance via the example root, the only inputs that change are the tenant slug, env, and the two pre-issued secrets:
cd terraform/litellm/aws/examples/default
export TF_VAR_litellm_master_key="sk-..." # the tenant's master key
export TF_VAR_litellm_license="lic-..." # their LITELLM_LICENSE
terraform apply \
-var "region=us-west-2" \
-var 'azs=["us-west-2a","us-west-2b"]' \
-var "tenant=acme" \
-var "env=stage"
To run many tenants from a single config, call the module with
for_each instead of one root per tenant (see “Using as a module”):
module "litellm" {
for_each = toset(["acme", "globex"])
source = "github.com/BerriAI/litellm//terraform/litellm/aws?ref=<tag>"
tenant = each.key
env = "prod"
region = "us-west-2"
azs = ["us-west-2a", "us-west-2b"]
}
(This for_each form is only possible because the module declares no
provider block — the original root-with-provider layout forbade it.)
Both litellm_master_key and litellm_license are optional:
litellm_master_key → the stack auto-generates a random sk-…
value (trial/dev path).litellm_license → no license secret is created and gateway/
backend run without LITELLM_LICENSE (OSS-only).Use TF_VAR_* env vars rather than tfvars files for these — values
written to a tfvars file end up in terraform.tfstate and any committed
example files.
cd terraform/litellm/aws/examples/default
cp terraform.tfvars.example terraform.tfvars
# Edit: region, tenant, env, azs, proxy_config, gateway_extra_secrets.
terraform init
terraform apply
examples/default/ is a thin root that configures the aws provider and
calls the module (../../). It exposes a curated variable surface; for
advanced knobs (per-component CPU/memory/workers, autoscaling, RDS/Redis
sizing, per-component image pins) set them on the module "litellm" block
in examples/default/main.tf, or call the module from your own config —
see “Using as a module” below.
That single apply provisions everything, runs the DB user bootstrap, runs the schema migration, and only then starts the gateway/backend services. When it returns, the stack is serving traffic.
terraform output alb_url
# UI login: admin / <master key>
aws secretsmanager get-secret-value \
--secret-id "$(terraform output -raw master_key_secret_arn)" \
--query SecretString --output text
The directory itself is a module with no provider block — the caller
owns provider config. That means you can call it directly with for_each
(many tenants from one config), count (conditional stacks), depends_on,
an assume-role / aliased provider, etc.:
provider "aws" {
region = "us-west-2"
assume_role { role_arn = "arn:aws:iam::111122223333:role/deployer" }
}
module "litellm" {
source = "github.com/BerriAI/litellm//terraform/litellm/aws?ref=<tag>"
region = "us-west-2"
tenant = "acme"
env = "prod"
azs = ["us-west-2a", "us-west-2b"]
# ...any of the inputs in variables.tf...
}
Tags: the module threads its own litellm:stack / managed-by / var.tags
onto every taggable resource. Any default_tags on your provider merge on
top — set org-wide tags there, per-deployment tags via the tags input.
The defaults pull from ghcr.io/berriai/litellm-<component>:v1.86.0-dev,
which is anonymous-readable. There are four images: litellm-gateway,
litellm-backend, litellm-ui, and litellm-migrations (slim image used
only by the one-off migration task — runs prisma migrate deploy against
the writer DB and exits). Bump them together when bumping LiteLLM. To pull
from a private registry:
AmazonECSTaskExecutionRolePolicy, which grants ECR pull for repos in
the same account. No extra config needed.ecr:GetAuthorizationToken + ecr:BatchGetImage on the foreign repo
ARNs.{"auths":{"<registry>":{"auth":"<base64-user:token>"}}}
in Secrets Manager and set repositoryCredentials.credentialsParameter
on the task def container — extend ecs.tf accordingly.terraform plan refuses to provision an HTTP-only ALB by default — TLS
is the supported posture. Two paths:
Production / staging — provide an ACM certificate:
var.region covering the DNS name you
plan to point at the ALB.acm_certificate_arn = "arn:aws:acm:..." in tfvars and apply.Result: a 443 listener carries the path-routing rules; the 80 listener serves a permanent 301 redirect to HTTPS, so HTTP clients are automatically upgraded.
Trial / dev — explicitly opt into HTTP-only:
Set allow_plaintext_alb = true in tfvars. Without this flag, plan fails
with a clear error pointing at the precondition. Intended for short-lived
trial / dev stacks only.
Three opt-in tripwires guard against accidental data loss on
terraform destroy:
skip_final_snapshot (Aurora; default false) — destroying the
cluster takes a <cluster>-final-<short-sha> snapshot first.s3_force_destroy (S3 bucket holding request log archives,
/v1/files content, and the S3 cache backend; default false) —
terraform destroy against a non-empty bucket fails.Flip either to true only for ephemeral / CI stacks where you accept
losing the contents.
| File | What’s in it |
|---|---|
versions.tf |
Terraform + required_providers constraints (module declares no provider config) |
examples/default/ |
Thin root: aws provider (with an optional default_tags slot for org-wide tags) + a call to the module. The one-command deploy path. |
variables.tf |
All input variables |
locals.tf |
Path-prefix lists for ALB routing (mirror of helm/.../ingress.yaml) |
network.tf |
VPC, subnets, IGW, NAT, route tables, security groups |
secrets.tf |
Secrets Manager entries + random passwords |
rds.tf |
Aurora Postgres cluster + writer / reader instances |
redis.tf |
ElastiCache Redis |
s3.tf |
S3 bucket + task-role policy scoped to it |
iam.tf |
Task execution + task roles, including rds-db:connect |
ecs.tf |
ECS cluster, task definitions, services for the three components |
alb.tf |
ALB, listener, target groups, path-routing rules |
migrations.tf |
One-off migration task definition |
outputs.tf |
DNS name, secret ARN, bootstrap SQL, migration run-task command |