Anthropic - Braintrust

Braintrust integrates with Anthropic so you can call Claude models from the Braintrust playground, API, and SDKs. Braintrust also traces Anthropic SDK calls from your application, including streaming, prompt caching, server-side tool use, and managed agents.

Add Anthropic as an AI provider

To use Anthropic models in the Braintrust playground, API, and gateway, connect Anthropic as a provider in your organization or project AI providers.

Visit Anthropic’s Console and create a new API key.
Go to Settings > AI providers.
Click Organization provider or Project provider, depending on whether you want the provider to be available across every project in the organization or just the current project.
Under Model providers, click Anthropic.
Choose your authentication method:
- API key: Paste an Anthropic API key into the Secret field.
  API keys are stored as one-way cryptographic hashes, never in plaintext.
- Workload identity federation: Exchange a Braintrust-signed OIDC token for an Anthropic access token, instead of storing a long-lived Anthropic API key in Braintrust.
  Workload identity federation is available only for organization-level providers on Braintrust-hosted organizations with the Braintrust gateway enabled. Project-level providers and self-hosted deployments must use API key authentication.
If you chose Workload identity federation, use the setup values shown in Braintrust to configure Anthropic:
1. Create an Anthropic service account. Copy the svac_... service account ID.
2. Register the Braintrust issuer in Anthropic. Use the Issuer URL shown in Braintrust, set JWKS source to OIDC discovery, leave Discovery base URL blank, turn Single-use tokens on, and set Max token lifetime to 1 hour.
3. Create a federation rule in Anthropic. Use the Subject pattern, Expected audience, and Required claims shown in Braintrust. Select the service account from the previous step and the Anthropic workspace Braintrust should use.
4. Paste the Anthropic IDs back into Braintrust:
- Federation rule ID: The fdrl_... value in Anthropic’s Workload identity > Rules table.
- Organization ID: The ID shown on Anthropic’s Organization settings > Organization page.
- Service account ID: The svac_... ID from the Anthropic service account.
- Workspace ID: Use default for Anthropic’s default workspace, or the workspace ID selected in the federation rule.
- Subject suffix: A stable suffix for this Anthropic connection. It must match the final part of the subject pattern used in Anthropic.
For general Anthropic concepts and Console details, see Anthropic’s workload identity federation docs.
Click Save.

Tracing

Braintrust traces Anthropic calls automatically with the braintrust/hook.mjs import hook, or manually with wrapAnthropic. Either path produces the same spans.Pick the tracing path that fits your application. Auto-instrumentation is the recommended path for most users.

Auto-instrumentation
Manual instrumentation

Setup

Install the Braintrust SDK alongside the Anthropic SDK, then configure your API keys.

Install packages

pnpm add braintrust @anthropic-ai/sdk

npm install braintrust @anthropic-ai/sdk

Get an Anthropic API key

Visit Anthropic’s Console and create a new API key, then add it as a Braintrust AI provider.

Set environment variables

.env

ANTHROPIC_API_KEY=<your-anthropic-api-key>
BRAINTRUST_API_KEY=<your-braintrust-api-key>

# For organizations on the EU data plane, use https://api-eu.braintrust.dev
# For self-hosted deployments, use your data plane URL
# BRAINTRUST_API_URL=<your-braintrust-api-url>

Trace your application

To trace Anthropic calls without modifying your application code, run your app with Braintrust’s import hook to patch @anthropic-ai/sdk at startup.

import Anthropic from "@anthropic-ai/sdk";
import { initLogger } from "braintrust";

initLogger({
  projectName: "My Project", // Replace with your project name
  apiKey: process.env.BRAINTRUST_API_KEY,
});

const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });

const result = await client.messages.create({
  model: "claude-sonnet-4-5-20250929",
  max_tokens: 1024,
  messages: [{ role: "user", content: "What is machine learning?" }],
});

Run with the import hook:

node --import braintrust/hook.mjs trace-anthropic-auto.js

The auto-instrumentation example uses plain JavaScript so node --import can run the file directly. The Braintrust APIs work the same in TypeScript projects — compile your TypeScript to JavaScript, then run the compiled file with the import hook.

If you’re using a bundler, see Trace LLM calls for plugin and loader setup.

AWS Bedrock

To trace Claude models on AWS Bedrock through Anthropic’s @anthropic-ai/bedrock-sdk, create an AnthropicBedrock client and run with the import hook. The package is built on @anthropic-ai/sdk, so calls appear under anthropic.messages.create with provider: "anthropic" metadata, identical to the direct Anthropic SDK.

import { AnthropicBedrock } from "@anthropic-ai/bedrock-sdk";
import { initLogger } from "braintrust";

initLogger({
  projectName: "My Project", // Replace with your project name
  apiKey: process.env.BRAINTRUST_API_KEY,
});

// Resolves AWS credentials from the standard environment chain
const client = new AnthropicBedrock();

const result = await client.messages.create({
  model: "anthropic.claude-3-5-sonnet-20241022-v2:0",
  max_tokens: 1024,
  messages: [{ role: "user", content: "What is machine learning?" }],
});

Run with the import hook:

node --import braintrust/hook.mjs trace-anthropic-bedrock-auto.js

Setup

Install the Braintrust SDK alongside the Anthropic SDK, then configure your API keys.

Install packages

pnpm add braintrust @anthropic-ai/sdk

npm install braintrust @anthropic-ai/sdk

Get an Anthropic API key

Visit Anthropic’s Console and create a new API key, then add it as a Braintrust AI provider.

Set environment variables

.env

ANTHROPIC_API_KEY=<your-anthropic-api-key>
BRAINTRUST_API_KEY=<your-braintrust-api-key>

# For organizations on the EU data plane, use https://api-eu.braintrust.dev
# For self-hosted deployments, use your data plane URL
# BRAINTRUST_API_URL=<your-braintrust-api-url>

Trace your application

To trace Anthropic calls manually, wrap your client with wrapAnthropic. Once wrapped, every messages.create call (including streaming) emits a span.

import Anthropic from "@anthropic-ai/sdk";
import { wrapAnthropic, initLogger } from "braintrust";

// Initialize the Braintrust logger
const logger = initLogger({
  projectName: "My Project", // Your project name
  apiKey: process.env.BRAINTRUST_API_KEY,
});

// Wrap the Anthropic client with the Braintrust logger
const client = wrapAnthropic(
  new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY }),
);

// All API calls are automatically logged
const result = await client.messages.create({
  model: "claude-sonnet-4-5-20250929",
  max_tokens: 1024,
  messages: [{ role: "user", content: "What is machine learning?" }],
});

For more control over tracing, learn how to customize traces.

AWS Bedrock

To trace Claude models on AWS Bedrock through Anthropic’s @anthropic-ai/bedrock-sdk, wrap the AnthropicBedrock client with wrapAnthropic. The package is built on @anthropic-ai/sdk, so calls appear under anthropic.messages.create with provider: "anthropic" metadata, identical to the direct Anthropic SDK.

import { AnthropicBedrock } from "@anthropic-ai/bedrock-sdk";
import { wrapAnthropic, initLogger } from "braintrust";

initLogger({
  projectName: "My Project", // Replace with your project name
  apiKey: process.env.BRAINTRUST_API_KEY,
});

// Resolves AWS credentials from the standard environment chain
const client = wrapAnthropic(new AnthropicBedrock());

const result = await client.messages.create({
  model: "anthropic.claude-3-5-sonnet-20241022-v2:0",
  max_tokens: 1024,
  messages: [{ role: "user", content: "What is machine learning?" }],
});

To trace the native AWS Bedrock Runtime client (@aws-sdk/client-bedrock-runtime) instead, see AWS Bedrock.

Gateway

To call Anthropic through the Braintrust gateway, point an OpenAI-compatible client at the gateway base URL and use your Braintrust API key for authentication.

import { OpenAI } from "openai";

const client = new OpenAI({
  baseURL: "https://gateway.braintrust.dev/v1",
  apiKey: process.env.BRAINTRUST_API_KEY,
});

const response = await client.chat.completions.create({
  model: "claude-sonnet-4-5-20250929",
  messages: [{ role: "user", content: "What is a gateway?" }],
  seed: 1, // A seed activates the gateway's cache
});

Structured outputs

Anthropic supports structured outputs natively via the output_config parameter on the Anthropic SDK, or through the Braintrust gateway using an OpenAI-shaped response_format.

import Anthropic from "@anthropic-ai/sdk";
import { zodOutputFormat } from "@anthropic-ai/sdk/helpers/zod";
import { z } from "zod";

const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });

// Define a Zod schema for the response
const ResponseSchema = z.object({
  name: z.string(),
  age: z.number(),
});

const message = await client.messages.parse({
  model: "claude-sonnet-4-5-20250929",
  max_tokens: 1024,
  messages: [
    { role: "user", content: "My name is John and I'm 30 years old." },
  ],
  output_config: {
    format: zodOutputFormat(ResponseSchema),
  },
});

console.log(message.parsed_output); // { name: "John", age: 30 }

import { OpenAI } from "openai";
import { z } from "zod";

const client = new OpenAI({
  baseURL: "https://gateway.braintrust.dev/v1",
  apiKey: process.env.BRAINTRUST_API_KEY,
});

// Define a Zod schema for the response
const ResponseSchema = z.object({
  name: z.string(),
  age: z.number(),
});

const completion = await client.beta.chat.completions.parse({
  model: "claude-sonnet-4-5-20250929",
  messages: [
    { role: "system", content: "Extract the person's name and age." },
    { role: "user", content: "My name is John and I'm 30 years old." },
  ],
  response_format: {
    type: "json_schema",
    json_schema: {
      name: "person",
      // The Zod schema for the response
      schema: ResponseSchema,
    },
  },
});

What Braintrust traces

Braintrust emits spans for the Anthropic SDK’s messages API and beta tool runner. Message spans capture the input messages, system prompt, model, request parameters, response content, stop reason, and stop sequence. Beta tool-runner spans capture task input, tools, response messages, and aggregated metrics across iterations.Spans

Span	Coverage
`anthropic.messages.create`	Messages API (and beta messages)
`anthropic.beta.messages.toolRunner`	Beta messages tool-runner iterations

Metrics

Metric	Description
`prompt_tokens`	Input tokens
`completion_tokens`	Output tokens
`prompt_cached_tokens`	Tokens read from the prompt cache
`prompt_cache_creation_tokens`	Tokens written to the prompt cache
`time_to_first_token`	First-token latency (streaming only)
`server_tool_use_*`	Server-side tool usage counters (for example, `server_tool_use_web_search_requests`)

Tracing resources

Evals

Evaluations distill the non-deterministic outputs of Anthropic models into an effective feedback loop. The Braintrust Eval function is composed of a dataset of user inputs, a task, and a set of scorers. To learn more about evaluations, see the Experiments guide.

Basic eval setup

Evaluate the outputs of Anthropic models with Braintrust.

import { Eval } from "braintrust";
import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
});

Eval("Anthropic Evaluation", {
  // An array of user inputs and expected outputs
  data: () => [
    { input: "What is 2+2?", expected: "4" },
    { input: "What is the capital of France?", expected: "Paris" },
  ],
  task: async (input) => {
    // Your Anthropic LLM call
    const response = await client.messages.create({
      model: "claude-sonnet-4-5-20250929",
      max_tokens: 1024,
      messages: [{ role: "user", content: input }],
    });
    return response.content[0].text;
  },
  scores: [
    {
      name: "accuracy",
      // A simple scorer that returns 1 if the output matches the expected output, 0 otherwise
      scorer: (args) => (args.output === args.expected ? 1 : 0),
    },
  ],
});

Learn more about eval data and scorers.

Use Anthropic as an LLM judge

You can use Anthropic models to score the outputs of other AI systems. This example uses the LLMClassifierFromSpec scorer to score the relevance of the outputs of an AI system.Install the autoevals package to use the LLMClassifierFromSpec scorer.

pnpm add autoevals

npm install autoevals

Create a scorer that uses the LLMClassifierFromSpec scorer to score the relevance of the output. You can then include relevanceScorer as a scorer in your Eval function (see above).

import { LLMClassifierFromSpec } from "autoevals";

const relevanceScorer = LLMClassifierFromSpec("Relevance", {
  choice_scores: { Relevant: 1, Irrelevant: 0 },
  model: "claude-sonnet-4-5-20250929",
  use_cot: true,
});

​Add Anthropic as an AI provider

​Tracing

​Setup

​Trace your application

​AWS Bedrock

​Setup

​Trace your application

​AWS Bedrock

​Gateway

​Structured outputs

​What Braintrust traces

​Tracing resources

​Evals

​Basic eval setup

​Use Anthropic as an LLM judge

Add Anthropic as an AI provider

Tracing

Setup

Trace your application

AWS Bedrock

Setup

Trace your application

AWS Bedrock

Gateway

Structured outputs

What Braintrust traces

Tracing resources

Evals

Basic eval setup

Use Anthropic as an LLM judge