Deploy Braintrust - Braintrust

AWS
GCP
Azure

Deploy the Braintrust data plane in your AWS account using the Braintrust Terraform module. This is the recommended way to self-host Braintrust on AWS.Braintrust recommends deploying in a dedicated AWS account. AWS enforces account-level Lambda concurrency limits, and since Braintrust’s API runs on Lambda, sharing an account with other workloads can lead to throttling and service disruptions. A dedicated account also aligns with AWS best practices for workload isolation and security.

To test infrastructure provisioning before committing to production-sized resources, use the sandbox example. It uses minimal instance sizes and has deletion protection disabled for easy teardown. It is not suitable for performance or load testing.

1. Configure the Terraform module

The Braintrust Terraform module contains all the necessary resources for a self-hosted Braintrust data plane.

Copy the entire contents of the examples/braintrust-data-plane directory from the terraform-aws-braintrust-data-plane repository into your own repository.
In provider.tf, configure your AWS account and region. Supported regions: ap-northeast-1, ap-south-1, ap-southeast-2, ca-central-1, eu-central-1, eu-west-1, eu-west-2, eu-west-3, sa-east-1, us-east-1, us-east-2, and us-west-2. If you require support for a different region, contact Braintrust.
In terraform.tf, set up your remote backend (typically S3 and DynamoDB).
In main.tf, customize the Braintrust deployment settings. The defaults are suitable for a large production-sized deployment. Adjust them based on your needs, but keep in mind the hardware requirements.
Each deployment must have a unique deployment_name within the same AWS account (max 18 characters). The default is "braintrust", change this if you have multiple deployments. Resource names (IAM roles, RDS instances, S3 buckets) are prefixed with this value and will collide if duplicated.

Brainstore instances require instance types with local NVMe storage for caching (e.g., c8gd, c5d, m5d, i3, i4i families). Generic instance types without local storage (t3, m5, c5) are not supported and will fail at plan time.

2. Initialize AWS account

If you’re using a new AWS account, run the create-service-linked-roles.sh script to create all necessary IAM service-linked roles for the deployment:

./scripts/create-service-linked-roles.sh

3. Configure Brainstore license

Your deployment includes Brainstore, a high-performance query engine for real-time trace ingestion. Brainstore requires a license key.

Go to Settings > Data plane.
Only organization owners can access this page. If you don’t see your data plane configuration, contact Braintrust to enable self-hosting.
Copy your Brainstore license.
Pass the key to Terraform. The recommended approach is to store the license key in AWS Secrets Manager and reference it using a Terraform data source:
data "aws_secretsmanager_secret_version" "brainstore_license" { secret_id = "braintrust/brainstore-license-key" }
Then pass data.aws_secretsmanager_secret_version.brainstore_license.secret_string as the brainstore_license_key value in the module. Alternatively, you can pass the key without storing it in Secrets Manager:
- Set TF_VAR_brainstore_license_key=your-key in your environment.
- Pass it via command line: terraform apply -var 'brainstore_license_key=your-key'.
- Add it to an uncommitted terraform.tfvars or .auto.tfvars file.
Do not commit the license key to your git repository.

4. Deploy the module

Initialize and apply the Terraform configuration:

terraform init
terraform apply

The first terraform apply may fail with transient errors such as ASG health check timeouts (while instances are still booting) or Lambda rate limits. Re-running terraform apply resolves these.

This will create all necessary AWS resources including:

Two isolated VPCs:
- Main VPC: Hosts Braintrust services (API, database, Redis, Brainstore)
- Quarantine VPC: Runs user-defined functions (scorers, tools) in network isolation. This creates ~30 Lambda functions across multiple runtimes. This is required for most production use cases.
Lambda functions for the Braintrust API
Public CloudFront endpoint and API Gateway
EC2 Auto-scaling group for Brainstore
PostgreSQL database, Redis cache, and S3 buckets
KMS key for encryption

5. Get your API URL

After the deployment completes, get your API URL from the Terraform outputs:

terraform output

You should see output similar to:

api_url = "https://dx6atff6gocr6.cloudfront.net"

Save this URL. You’ll need it to configure your Braintrust organization.

6. Configure your organization

Connect your Braintrust organization to your newly deployed data plane.

Changing your live organization’s API URL can disrupt access for existing users. If you are testing, create a new Braintrust organization for your data plane instead of updating your live environment.

Go to Settings > Data plane.
Only organization owners can access this page.
In API URL area, select Edit.
Enter the API URL from the last step.
Leave the other fields blank.
If your deployment is accessed through a VPN or is otherwise on a private network (not accessible from the public internet), enable Data plane is on a private network. This enables Chrome’s Local Network Access permission handling, which is required for browser access to private network resources. When enabled, Chrome will prompt users to grant permission for the Braintrust UI to access your self-hosted data plane. See Grant browser permissions for details.
Select Save.

The UI will automatically test the connection to your new data plane. Verify that the ping to each endpoint is successful.

Debug issues

If you encounter issues, you can use the dump-logs.sh script to collect logs:

./scripts/dump-logs.sh <deployment_name> [--minutes N] [--service <svc1,svc2,...|all>]

For example, to dump 60 minutes of logs for the bt-sandbox deployment, run:

./scripts/dump-logs.sh bt-sandbox

This will save logs for all services to a logs-<deployment_name> directory, which you can share with the Braintrust team for debugging.

Customize the deployment

Use an existing VPC

To deploy into an existing VPC instead of creating a new one, set create_vpc = false and provide your VPC and subnet IDs:

module "braintrust-data-plane" {
  source = "github.com/braintrustdata/terraform-aws-braintrust-data-plane"

  create_vpc = false

  existing_vpc_id              = "vpc-xxxxxxxxx"
  existing_private_subnet_1_id = "subnet-xxxxxxxxx"
  existing_private_subnet_2_id = "subnet-xxxxxxxxx"
  existing_private_subnet_3_id = "subnet-xxxxxxxxx"
  existing_public_subnet_1_id  = "subnet-xxxxxxxxx"

  # ... other configuration ...
}

Your existing VPC must have:

At least 3 private subnets across different availability zones
At least 1 public subnet
Internet and NAT gateways with properly configured route tables

The module manages its own security groups. To also use an existing quarantine VPC, set existing_quarantine_vpc_id and the corresponding existing_quarantine_private_subnet_*_id variables.

Use custom tags

To apply custom tags to all resources, pass the custom_tags parameter to the Braintrust module:

module "braintrust-data-plane" {
  source = "github.com/braintrustdata/terraform-aws-braintrust-data-plane"

  custom_tags = {
    Environment = "production"
    Team        = "ml-platform"
    CostCenter  = "engineering"
  }

  # ... other configuration ...
}

These tags will be applied to all resources including Brainstore EC2 instances, volumes, and ENIs. The deployment name variable automatically prefixes resource names and applies a BraintrustDeploymentName tag across all resources.

Use the custom_tags parameter instead of the AWS provider’s default_tags configuration. Due to a Terraform limitation, default_tags are not applied to resources that use launch templates, such as Brainstore instances.

Redis instance sizing

Important for AWS: Avoid using burstable Redis instances (t-family instances like cache.t4g.micro) in production. These instances use CPU credits that can be exhausted during high-load periods, leading to performance throttling.Instead, use non-burstable instances like cache.r7g.large, cache.r6g.medium, or cache.r5.large for predictable performance. Even if these instances seem oversized initially, they provide consistent performance without the risk of CPU credit exhaustion.

VPC connectivity

To connect Braintrust’s VPC to other internal resources (like an LLM gateway), use one of the following approaches:

Create a VPC Endpoint Service for your internal resource, then create a VPC Interface Endpoint inside of the Braintrust “Quarantine” VPC
Set up VPC peering with the Braintrust “Quarantine” VPC

Lambda memory limits

The API Handler and AI Proxy Lambda functions default to 10240 MB (the Lambda maximum). You can reduce these to lower costs in environments with tighter memory quotas, though Braintrust recommends keeping the defaults for production workloads.

module "braintrust-data-plane" {
  source = "github.com/braintrustdata/terraform-aws-braintrust-data-plane"

  api_handler_memory_limit = 10240  # default, valid range 1–10240 MB
  ai_proxy_memory_limit    = 10240  # default, valid range 1–10240 MB

  # ... other configuration ...
}

The brainstore_wal_footer_version variable controls the WAL footer format written by Brainstore. It defaults to "" (unset) and should not be changed outside of a planned upgrade sequence.

Do not set brainstore_wal_footer_version without following the upgrade guide. Setting it at the same time as a version bump can cause Brainstore nodes still rolling out to fail to read the new WAL format.

See Enable efficient WAL format in the v2.0 upgrade guide for the correct migration steps.

KMS encryption

When kms_key_arn is configured, all managed S3 buckets (Brainstore, code-bundle, and Lambda responses) enforce blocked_encryption_types = ["NONE"], preventing unencrypted object uploads. This policy is applied automatically as of v4.5.0 — upgrading from an earlier version will include this change in your terraform plan.

AI Proxy CORS headers

As of v4.5.0, the x-bt-use-gateway header is included in the AI Proxy Lambda function URL CORS allowed headers. Browser clients can send this header without triggering a CORS preflight rejection.

Deploy the Braintrust data plane in your GCP project using the Braintrust Terraform module and Helm chart. This is the recommended way to self-host Braintrust on GCP.

1. Configure the Terraform module

The Braintrust Terraform module contains all the necessary resources for a self-hosted Braintrust data plane. A dedicated Google Cloud project for your Braintrust deployment is recommended but not required.

Copy the entire contents of the examples/braintrust-data-plane directory from the terraform-google-braintrust-data-plane repository into your own repository.
In provider.tf, configure your Google Cloud project and region.
In backend.tf, set up your remote backend (typically a GCS bucket).
In main.tf, customize the Braintrust deployment settings. The defaults are suitable for a large production-sized deployment. Adjust them based on your needs, but keep in mind the hardware requirements.

2. Enable Google Cloud APIs

Before deploying, enable the required Google Cloud services. Run the following in Cloud Shell:

In a Cloud Shell, set the project to deploy Braintrust into:
gcloud config set project "your-project-id"

Enable the required services:

gcloud services enable storage-api.googleapis.com \
  storage-component.googleapis.com \
  storage.googleapis.com \
  redis.googleapis.com \
  secretmanager.googleapis.com \
  servicenetworking.googleapis.com \
  logging.googleapis.com \
  monitoring.googleapis.com \
  oslogin.googleapis.com \
  dns.googleapis.com \
  cloudresourcemanager.googleapis.com \
  compute.googleapis.com \
  cloudkms.googleapis.com \
  autoscaling.googleapis.com \
  iam.googleapis.com \
  iamcredentials.googleapis.com \
  vpcaccess.googleapis.com \
  sts.googleapis.com \
  container.googleapis.com \
  sqladmin.googleapis.com \
  artifactregistry.googleapis.com

Allow approximately 5 minutes for services to activate.

3. Deploy the Terraform module

Initialize and apply the Terraform configuration:

terraform init
terraform apply

This will create all necessary GCP resources including:

GKE cluster for running Braintrust services
Cloud SQL PostgreSQL database
Cloud Memorystore Redis cache
Cloud Storage buckets
VPC network and subnets
Cloud KMS key for encryption

4. Set up Kubernetes

After the Terraform deployment completes, connect to your GKE cluster:

gcloud auth login
gcloud config set project "<project-id>"
gcloud container clusters get-credentials <deployment_name>-gke-autopilot --region <region>

Verify cluster connectivity:

kubectl cluster-info

Create the namespace for Braintrust:

kubectl create namespace braintrust

5. Create Kubernetes secrets

Create the required Kubernetes secrets for your deployment. The secrets needed depend on your GCS authentication method:

API: Can use either native GCS authentication (recommended) or S3 compatibility mode with HMAC keys (legacy)
Brainstore: Always uses native GCS authentication via Workload Identity

See GCS authentication options below for the specific commands.

6. Deploy Helm chart

Create a helm-values.yaml file for your deployment. Refer to the Helm chart documentation for configuration options.Deploy the Braintrust Helm chart to your cluster:

helm install braintrust \
  oci://public.ecr.aws/braintrust/helm/braintrust \
  --namespace braintrust \
  --version <version> \
  --values helm-values.yaml

See all Helm chart releases: GitHub Releases

7. Configure Ingress (HTTPS)

The data plane requires a publicly reachable HTTPS endpoint with a valid TLS certificate. The Helm chart deploys the API as a ClusterIP service. You are expected to provide your own ingress solution that terminates TLS and routes traffic to the braintrust-api service on port 8000.Common approaches:

GCP Application Load Balancer with a Google-managed certificate (requires a custom domain)
GKE Gateway API with cert-manager and Let’s Encrypt
Cloud Run NGINX proxy with a VPC Connector for SSL termination (no custom domain required)
Istio/ASM Gateway - the Helm chart includes native VirtualService support (see virtualService in values.yaml)
Any reverse proxy or load balancer that terminates TLS and forwards HTTP to the API service

After configuring your ingress, save the resulting HTTPS URL. You’ll need it to configure your Braintrust organization.If your ingress uses a private or self-signed certificate, pass the CA bundle to the bt CLI using the --ca-cert <PATH> flag or the BRAINTRUST_CA_CERT environment variable so that bt commands can connect to your data plane.

8. Configure your organization

Connect your Braintrust organization to your newly deployed data plane.

Go to Settings > Data plane.
Only organization owners can access this page.
In API URL area, select Edit.
Enter the API URL from the last step.
Leave the other fields blank.
If your deployment is accessed through a VPN or is otherwise on a private network (not accessible from the public internet), enable Data plane is on a private network. This enables Chrome’s Local Network Access permission handling, which is required for browser access to private network resources. When enabled, Chrome will prompt users to grant permission for the Braintrust UI to access your self-hosted data plane. See Grant browser permissions for details.
Select Save.

The UI will automatically test the connection to your new data plane. Verify that the ping to each endpoint is successful.

GCS authentication options

Braintrust services use different authentication methods for Google Cloud Storage:

API: Can use either native GCS authentication (recommended) or S3 compatibility mode with HMAC keys (legacy)
Brainstore: Always uses native GCS authentication via Workload Identity

Workload Identity setup

The Terraform module automatically configures Workload Identity for your GKE cluster and creates two service accounts with the following IAM grants.The module creates two GCS buckets:

Brainstore bucket (<deployment_name>-brainstore-*): Brainstore data storage.
API bucket (<deployment_name>-api-*): Contains two storage paths — code-bundle/ (API layer writes) and brainstore-cache/ (ephemeral Brainstore cache, automatically deleted after 1 day by a GCS lifecycle rule).

The brainstore-cache/ objects are ephemeral and managed automatically. Operators do not need to manage or back up this data.

Service account	Role	Resource	Purpose
Braintrust API	`roles/storage.objectAdmin`	Brainstore bucket, API bucket	Read/write objects
Braintrust API	`roles/storage.legacyBucketReader`	Brainstore bucket, API bucket	List bucket contents
Braintrust API	`roles/iam.serviceAccountTokenCreator`	Itself	Generate short-lived tokens and signed GCS URLs
Brainstore	`roles/storage.objectAdmin`	Brainstore bucket, API bucket	Read/write objects
Brainstore	`roles/storage.legacyBucketReader`	Brainstore bucket, API bucket	List bucket contents

API authentication configuration

Choose one of the following authentication methods for the API service:

Native GCS auth (recommended)

Native GCS authentication uses the @google-cloud/storage SDK and Workload Identity. This is the recommended approach for enhanced security as it eliminates the need to manage service account keys.Requirements:

Helm chart version 3.1.0 or later
Workload Identity configured (automatic with Terraform module)

Kubernetes secrets:For native GCS authentication, create secrets without GCS credentials:

kubectl create secret generic braintrust-secrets \
  --from-literal=REDIS_URL="<redis_url>" \
  --from-literal=PG_URL="<pg_url>" \
  --from-literal=FUNCTION_SECRET_KEY="<randomly_generated_secret>" \
  --from-literal=BRAINSTORE_LICENSE_KEY="<brainstore_license_key>" \
  --namespace=braintrust

Refer to the Terraform outputs for the connection strings. The Brainstore license key can be found at Settings > Data plane. Only organization owners can access this page.

Helm configuration:In your Helm values file, enable native GCS authentication and configure the Google service account:

api:
  serviceAccount:
    googleServiceAccount: "prod-braintrust@your-project-id.iam.gserviceaccount.com"
  enableGcsAuth: true

The Terraform module outputs the service account email as braintrust_service_account. Run terraform output braintrust_service_account to get the full email address.The enableGcsAuth setting defaults to false for backwards compatibility. Contact Braintrust if you want to enable native GCS authentication by default.

S3 compatibility mode (legacy)

S3 compatibility mode uses HMAC keys to access GCS through the S3-compatible API. This is the legacy authentication method.Kubernetes secrets:For S3 compatibility mode, include HMAC credentials:

kubectl create secret generic braintrust-secrets \
  --from-literal=REDIS_URL="<redis_url>" \
  --from-literal=PG_URL="<pg_url>" \
  --from-literal=GCS_ACCESS_KEY_ID="<braintrust_hmac_access_id>" \
  --from-literal=GCS_SECRET_ACCESS_KEY="<braintrust_hmac_secret>" \
  --from-literal=FUNCTION_SECRET_KEY="<randomly_generated_secret>" \
  --from-literal=BRAINSTORE_LICENSE_KEY="<brainstore_license_key>" \
  --namespace=braintrust

Refer to the Terraform outputs for the connection strings. The Brainstore license key can be found at Settings > Data plane. Only organization owners can access this page.

Helm configuration:In your Helm values file, ensure enableGcsAuth is not set or set to false. Do not configure googleServiceAccount when using HMAC keys:

api:
  enableGcsAuth: false  # or omit this line

Helm chart v5.0.1+ automatically sets AWS_REQUEST_CHECKSUM_CALCULATION and AWS_RESPONSE_CHECKSUM_VALIDATION to WHEN_REQUIRED when enableGcsAuth is disabled. This ensures AWS SDK compatibility with the GCS S3-compatible endpoint. No additional configuration is required.

Tune GCS retry behavior (optional)

When api.enableGcsAuth: true, the API service uses the @google-cloud/storage SDK and applies the SDK’s default retry policy for transient GCS errors. To override these defaults, set any of the following environment variables on the API container. If none are set, the SDK defaults apply. See Cloud Storage retry strategy for the upstream defaults and the Node.js client behavior.

Environment variable	Type	Description
`GCS_RETRY_OPTIONS_AUTO_RETRY`	boolean	Enable or disable automatic retries.
`GCS_RETRY_OPTIONS_MAX_RETRIES`	integer	Maximum number of retry attempts per request.
`GCS_RETRY_OPTIONS_RETRY_DELAY_MULTIPLIER`	float	Exponential backoff multiplier between retries.
`GCS_RETRY_OPTIONS_TOTAL_TIMEOUT`	integer (seconds)	Total time allowed across all retries for a request.
`GCS_RETRY_OPTIONS_MAX_RETRY_DELAY`	integer (seconds)	Maximum delay between consecutive retries.
`GCS_RETRY_OPTIONS_IDEMPOTENCY_STRATEGY`	string	One of `retry-always`, `retry-conditional`, or `retry-never`.

Set these through api.extraEnvVars in your Helm values file:

api:
  enableGcsAuth: true
  extraEnvVars:
    - name: GCS_RETRY_OPTIONS_MAX_RETRIES
      value: "5"
    - name: GCS_RETRY_OPTIONS_TOTAL_TIMEOUT
      value: "120"
    - name: GCS_RETRY_OPTIONS_IDEMPOTENCY_STRATEGY
      value: "retry-conditional"

Requires data plane v2.0.0 or later. These environment variables are read only when api.enableGcsAuth: true and have no effect in S3 compatibility mode. They apply to the API service only. Brainstore manages its GCS client independently.

Brainstore configuration

Brainstore always uses native GCS authentication. In your Helm values file, configure the Google service account for Workload Identity:

brainstore:
  serviceAccount:
    googleServiceAccount: "prod-brainstore@your-project-id.iam.gserviceaccount.com"

The Terraform module outputs the service account email as brainstore_service_account. Run terraform output brainstore_service_account to get the full email address.

Customize the deployment

Service account impersonation

If Brainstore needs to access GCS buckets or other GCP resources in another project that are restricted to a specific service account identity, use brainstore_impersonation_targets to grant the Brainstore Kubernetes service account the ability to impersonate one or more Google Cloud service accounts.

module "braintrust_data_plane" {
  source = "github.com/braintrustdata/terraform-google-braintrust-data-plane"
  # ...other variables...
  brainstore_impersonation_targets = [
    "projects/my-other-project/serviceAccounts/data-access@my-other-project.iam.gserviceaccount.com"
  ]
}

This grants roles/iam.serviceAccountTokenCreator on each target service account to the Brainstore Kubernetes service account, enabling Brainstore to generate short-lived tokens and act as those accounts. Values must use the full resource name format projects/{project_id}/serviceAccounts/{service_account_email}, not bare email addresses. The default is [] (no impersonation).

This variable is only needed if you are not separately granting the Brainstore service account IAM access to the target accounts. The Terraform executor must have roles/iam.serviceAccountAdmin or roles/resourcemanager.projectIamAdmin on each target service account (or equivalent project-level permissions). Target service accounts must also exist before running terraform apply.

Custom labels

To apply user-defined GCP labels to all resources created by the module, use the custom_labels variable:

module "braintrust_data_plane" {
  source = "github.com/braintrustdata/terraform-google-braintrust-data-plane"

  custom_labels = {
    environment = "production"
    team        = "platform"
    cost-center = "ai-infra"
  }

  # ... other configuration ...
}

Labels are applied to Cloud SQL, Memorystore Redis, Cloud Storage buckets, the GKE cluster, and the KMS key. The module’s built-in braintrustdeploymentname label is always preserved when merged with your custom labels.

Keys must start with a lowercase letter and contain only lowercase letters, numbers, underscores, or dashes (max 63 characters). Values must contain only lowercase letters, numbers, underscores, or dashes, do not require a leading letter, and may be empty (max 63 characters). Maximum 63 custom labels.

Private Service Access range

When the module creates the VPC (create_vpc = true), Cloud SQL and Memorystore connect over a Private Service Access range. By default, the module allocates a /16 range and lets Google pick the starting address. To avoid overlap with existing VPCs, peering connections, or corporate networks, use private_service_access_prefix_length and private_service_access_address:

module "braintrust_data_plane" {
  source = "github.com/braintrustdata/terraform-google-braintrust-data-plane"

  private_service_access_prefix_length = 16
  private_service_access_address       = "10.10.0.0"

  # ... other configuration ...
}

private_service_access_prefix_length accepts a value from 8 through 24 (default 16). Lower prefix lengths create larger ranges. Smaller ranges work, but reduce future expansion headroom, so choose the size with your Braintrust architecture team.
private_service_access_address sets the starting address of the range. If unset, Google selects an available range.

Choose these values before your first deployment. Changing the Private Service Access range later can require rebuilding Cloud SQL, Memorystore, and other dependent resources.

GKE Pod and Service IP ranges

To control the secondary IP ranges that the GKE cluster uses for Pods and Services, set either a CIDR block or the name of a pre-existing secondary range. By default, GKE allocates these ranges automatically.

module "braintrust_data_plane" {
  source = "github.com/braintrustdata/terraform-google-braintrust-data-plane"

  # Option A: let the module create the ranges from CIDR blocks
  gke_pods_ipv4_cidr_block     = "10.20.0.0/20"
  gke_services_ipv4_cidr_block = "10.30.0.0/22"

  # Option B: reference secondary ranges that already exist on the subnet
  # gke_pods_secondary_range_name     = "braintrust-pods"
  # gke_services_secondary_range_name = "braintrust-services"

  # ... other configuration ...
}

The CIDR variables accept a full CIDR ("10.20.0.0/20") or a netmask size only ("/20").
The secondary range name variables reference ranges that already exist on the subnet. They do not create the ranges.
For each range type (Pods and Services), set the CIDR block or the secondary range name, but not both. The module rejects setting both forms for the same range type.
Most deployments should leave the Service range unset unless they intentionally need a custom Service CIDR.

Choose these values before your first deployment. Changing GKE secondary ranges later requires recreating the cluster.

Deploy the Braintrust data plane in your Azure subscription using the Braintrust Terraform module and Helm chart. This is the recommended way to self-host Braintrust on Azure.

Requirements: Terraform >= 1.10.0 and the azurerm provider ~> 4.0 are required. If you have an existing deployment using azurerm 3.x, run terraform init -upgrade before applying and review the azurerm v4 upgrade guide.

1. Configure the Terraform module

The Braintrust Terraform module contains all the necessary resources for a self-hosted Braintrust data plane. A dedicated Azure subscription for your Braintrust deployment is recommended but not required.

Copy the entire contents of the examples/default directory from the terraform-azure-braintrust-data-plane repository into your own repository.
In provider.tf, configure your Azure subscription and tenant details.
In terraform.tf, set up your remote backend (typically Azure Blob Storage).
In main.tf, customize the Braintrust deployment settings. The defaults are suitable for a large production-sized deployment. Adjust them based on your needs, but keep in mind the hardware requirements. The module provisions two AKS node pools:
- brainstore pool (aks_brainstore_pool_vm_size): runs Brainstore pods. Must be a VM SKU with local NVMe SSD (e.g. Standard_D32ds_v6). The Azure Container Storage extension is automatically installed to configure RAID0 across the local disks.
- services pool (aks_services_pool_vm_size): runs API and other application pods. Does not require local SSD (e.g. Standard_D16s_v6).
Initially set enable_front_door = false in main.tf. You’ll enable this later after configuring the load balancer.

2. Configure Brainstore license

Your deployment requires a Brainstore license key.

Go to Settings > Data plane.
Only organization owners can access this page. If you don’t see your data plane configuration, contact Braintrust to enable self-hosting.
Copy your Brainstore license.
Pass the key to Terraform. Do not commit it to your git repository. Recommended options:
- Set TF_VAR_brainstore_license_key=your-key in your environment before running terraform apply.
- Pass it on the command line: terraform apply -var 'brainstore_license_key=your-key'.
- Add it to an uncommitted terraform.tfvars or .auto.tfvars file.

3. Deploy the base infrastructure

Initialize and apply the Terraform configuration:

terraform init
terraform apply

This will create all necessary Azure resources including:

AKS cluster for running Braintrust services
Azure Database for PostgreSQL
Azure Cache for Redis
Azure Storage Account
Virtual Network
Azure Key Vault for encryption and secrets

This deployment typically takes 15-20 minutes.

4. Restart PostgreSQL

The Terraform module configures PostgreSQL extensions (pg_cron, pg_partman) and sets cron.database_name to the Braintrust database. These are static parameters that require a server restart to take effect. The Terraform provider is configured to not restart the server automatically, since automatic restarts on configuration changes could cause unintended downtime in production.After your first terraform apply, restart the PostgreSQL server before proceeding:

az postgres flexible-server restart \
  --resource-group <resource-group-name> \
  --name <postgres-database-server-name>

You can find the resource group and server name in your Terraform outputs.

This step is only required on the initial deployment. Subsequent terraform apply runs do not require a restart unless you modify the PostgreSQL extension configuration.

5. Connect to AKS cluster

After the Terraform deployment completes, connect to your AKS cluster:

az aks get-credentials --resource-group braintrust --name braintrust-aks

Verify cluster connectivity:

kubectl cluster-info

6. Deploy Helm chart

Create a helm-values.yaml file for your deployment. Populate it with values from your Terraform outputs:

terraform output workload_identity_client_id
terraform output azure_tenant_id
terraform output key_vault_name
terraform output storage_account_name

Use those values to fill in your helm-values.yaml:

cloud: "azure"

global:
  orgName: "<your org name on Braintrust>"

azure:
  tenantId: "<azure_tenant_id>"
  enableAzureContainerStorageDriver: true
  enableAzureKeyVaultDriver: true
  keyVaultCSIclientID: "<workload_identity_client_id>"
  keyVaultName: "<key_vault_name>"

objectStorage:
  azure:
    storageAccountName: "<storage_account_name>"

api:
  annotations:
    service:
      service.beta.kubernetes.io/azure-load-balancer-internal: "true"
  service:
    type: LoadBalancer

Refer to the Helm chart documentation for the full list of configuration options.Deploy the Braintrust Helm chart to your cluster:

helm install braintrust \
  oci://public.ecr.aws/braintrust/helm/braintrust \
  --namespace braintrust \
  --create-namespace \
  --version <version> \
  --values helm-values.yaml

See all Helm chart releases: GitHub Releases

7. Enable Front Door

Retrieve the load balancer IP address and frontend configuration:

lb_ip_address=$(kubectl get service braintrust-api -n braintrust \
  -o jsonpath='{.status.loadBalancer.ingress[0].ip}')

az network lb list \
  --query "[?frontendIPConfigurations[?privateIPAddress=='$lb_ip_address']].{Name:name, ResourceGroup:resourceGroup, FrontendIPConfig:frontendIPConfigurations[0].name, Id:frontendIPConfigurations[0].id}" \
  -o table

Update main.tf with the values from the previous step and apply:

enable_front_door = true
front_door_api_backend_address = "<LB_IPAddress>"
front_door_load_balancer_frontend_ip_config_id = "<LB_FrontendIPConfigId>"

terraform apply

In the Azure Portal, find the private link service named <deployment>-aks-api-pls and manually approve it.
This manual approval step is an Azure platform requirement — Front Door cannot automatically approve private link connections to resources in a different subscription or tenant. The deployment will appear to succeed but Front Door traffic will not flow until the connection is approved.

Front Door deployment takes up to 45 minutes after the Terraform apply completes. Wait for the deployment to finish before proceeding.

8. Get your API URL

After the Front Door deployment completes, get your API URL from the Terraform outputs:

terraform output

Test the endpoint:

curl https://<endpoint-hostname>/

You should receive a 200 OK response. Save this URL. You’ll need it to configure your Braintrust organization.

9. Configure your organization

Connect your Braintrust organization to your newly deployed data plane.

Go to Settings > Data plane.
Only organization owners can access this page.
In API URL area, select Edit.
Enter the API URL from the last step.
Leave the other fields blank.
If your deployment is accessed through a VPN or is otherwise on a private network (not accessible from the public internet), enable Data plane is on a private network. This enables Chrome’s Local Network Access permission handling, which is required for browser access to private network resources. When enabled, Chrome will prompt users to grant permission for the Braintrust UI to access your self-hosted data plane. See Grant browser permissions for details.
Select Save.

The UI will automatically test the connection to your new data plane. Verify that the ping to each endpoint is successful.

Upgrade from v1.0.0

The resource group Terraform resource address changed from azurerm_resource_group.main to azurerm_resource_group.main[0]. A moved block in moved.tf handles this automatically — no manual state manipulation is required. Before applying, run terraform plan and confirm the resource group shows as moved rather than destroyed and recreated. If you see a destroy planned, do not apply without investigating.

Upgrade from v0.9.0

Upgrading from Terraform Azure module v0.9.0 to v1.0.0 requires several breaking changes:

Node pool variables renamed: Replace aks_user_pool_vm_size with aks_brainstore_pool_vm_size and aks_services_pool_vm_size, and aks_user_pool_max_count with aks_brainstore_pool_max_count and aks_services_pool_max_count. The existing user node pool will be destroyed and replaced by two new pools during the upgrade — drain and reschedule workloads before applying.
azurerm provider upgrade: Run terraform init -upgrade to update to azurerm ~> 4.0 before applying. Review the azurerm v4 upgrade guide for any state migrations needed (e.g. the enable_rbac_authorization → rbac_authorization_enabled rename on Key Vault resources).
brainstore_license_key now required: The module will fail at plan time if this variable is not set. See the Configure Brainstore license step above.

Next steps

Upgrade your deployment — learn how to keep your data plane up to date
Advanced configuration — configure telemetry, network and URL security, rate limiting, and other options

​1. Configure the Terraform module

​2. Initialize AWS account

​3. Configure Brainstore license

​4. Deploy the module

​5. Get your API URL

​6. Configure your organization

​Debug issues

​Customize the deployment

​Use an existing VPC

​Use custom tags

​Redis instance sizing

​VPC connectivity

​Lambda memory limits

​WAL footer version

​KMS encryption

​AI Proxy CORS headers

​1. Configure the Terraform module

​2. Enable Google Cloud APIs

​3. Deploy the Terraform module

​4. Set up Kubernetes

​5. Create Kubernetes secrets

​6. Deploy Helm chart

​7. Configure Ingress (HTTPS)

​8. Configure your organization

​GCS authentication options

​Workload Identity setup

​API authentication configuration

​Tune GCS retry behavior (optional)

​Brainstore configuration

​Customize the deployment

​Service account impersonation

​Custom labels

​Private Service Access range

​GKE Pod and Service IP ranges

​1. Configure the Terraform module

​2. Configure Brainstore license

​3. Deploy the base infrastructure

​4. Restart PostgreSQL

​5. Connect to AKS cluster

​6. Deploy Helm chart

​7. Enable Front Door

​8. Get your API URL

​9. Configure your organization

​Upgrade from v1.0.0

​Upgrade from v0.9.0

​Next steps

1. Configure the Terraform module

2. Initialize AWS account

3. Configure Brainstore license

4. Deploy the module

5. Get your API URL

6. Configure your organization

Debug issues

Customize the deployment

Use an existing VPC

Use custom tags

Redis instance sizing

VPC connectivity

Lambda memory limits

WAL footer version

KMS encryption

AI Proxy CORS headers

1. Configure the Terraform module

2. Enable Google Cloud APIs

3. Deploy the Terraform module

4. Set up Kubernetes

5. Create Kubernetes secrets

6. Deploy Helm chart

7. Configure Ingress (HTTPS)

8. Configure your organization

GCS authentication options

Workload Identity setup

API authentication configuration

Tune GCS retry behavior (optional)

Brainstore configuration

Customize the deployment

Service account impersonation

Custom labels

Private Service Access range

GKE Pod and Service IP ranges

1. Configure the Terraform module

2. Configure Brainstore license

3. Deploy the base infrastructure

4. Restart PostgreSQL

5. Connect to AKS cluster

6. Deploy Helm chart

7. Enable Front Door

8. Get your API URL

9. Configure your organization

Upgrade from v1.0.0

Upgrade from v0.9.0

Next steps