Skip to main content
Anthropic provides access to Claude models including Claude 4 Sonnet, Claude 4.1 Opus, and other cutting-edge language models. Braintrust integrates seamlessly with Anthropic through direct API access, wrapAnthropic wrapper functions for automatic tracing, and proxy support.
This guide covers manual instrumentation. For quicker setup, use auto-instrumentation.

Setup

To use Anthropic with Braintrust, you’ll need an Anthropic API key.
  1. Visit Anthropic’s Console and create a new API key
  2. Add the Anthropic API key to your organization’s AI providers or to a project’s AI providers
  3. Set the Anthropic API key and your Braintrust API key as environment variables
.env
ANTHROPIC_API_KEY=<your-anthropic-api-key>
BRAINTRUST_API_KEY=<your-braintrust-api-key>

# For organizations on the EU data plane, use https://api-eu.braintrust.dev
# For self-hosted deployments, use your data plane URL
# BRAINTRUST_API_URL=<your-braintrust-api-url>
API keys are stored as one-way cryptographic hashes, never in plaintext.
Install the braintrust and @anthropic-ai/sdk packages.
# pnpm
pnpm add braintrust @anthropic-ai/sdk
# npm
npm install braintrust @anthropic-ai/sdk

Trace with Anthropic

Trace your Anthropic LLM calls for observability and monitoring.

Trace automatically

Braintrust provides automatic tracing for Anthropic API calls, including streaming, token metrics, caching details, and server tool use.
  • TypeScript & Python: Use wrapAnthropic / wrap_anthropic wrapper functions
  • Go: Use the tracing middleware with the Anthropic client
  • Ruby: Use Braintrust::Trace::Anthropic.wrap to wrap the Anthropic client
  • Java: Use the tracing interceptor with the Anthropic client
  • C#: Use .WithBraintrust() to wrap the Anthropic client
For more control over tracing, learn how to customize traces.
import Anthropic from "@anthropic-ai/sdk";
import { wrapAnthropic, initLogger } from "braintrust";

// Initialize the Braintrust logger
const logger = initLogger({
  projectName: "My Project", // Your project name
  apiKey: process.env.BRAINTRUST_API_KEY,
});

// Wrap the Anthropic client with the Braintrust logger
const client = wrapAnthropic(
  new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY }),
);

// All API calls are automatically logged
const result = await client.messages.create({
  model: "claude-sonnet-4-5-20250929",
  max_tokens: 1024,
  messages: [{ role: "user", content: "What is machine learning?" }],
});

Captured metrics and metadata

Each traced Anthropic call logs metrics to the span based on what the API returns. Token counts are always present; other fields appear only when the relevant feature is in use.
MetricDescription
prompt_tokensTotal input tokens (including cached and cache-creation tokens)
completion_tokensOutput tokens
tokensTotal tokens (prompt + completion)
time_to_first_tokenTime to first token (streaming calls only)
When prompt caching is enabled:
MetricDescription
prompt_cached_tokensTokens read from the prompt cache
prompt_cache_creation_tokensTokens written to the prompt cache
When Claude uses server-side tools, Braintrust records the provider’s tool usage counters dynamically:
Metric patternDescription
server_tool_use_<field_name>Server-side tool usage counts returned by Anthropic. Examples include server_tool_use_web_search_requests, server_tool_use_web_fetch_requests, and server_tool_use_code_execution_requests.
The following metadata fields are also logged when the API returns them:
Metadata fieldDescription
usage_service_tierThe service tier that handled the request
usage_inference_geoThe region that processed the request
cache_creation_ephemeral_5m_input_tokensAnthropic ephemeral 5-minute cache creation tokens
cache_creation_ephemeral_1h_input_tokensAnthropic ephemeral 1-hour cache creation tokens

Evaluate with Anthropic

Evaluations distill the non-deterministic outputs of Anthropic models into an effective feedback loop that enables you to ship more reliable, higher quality products. The Braintrust Eval function is composed of a dataset of user inputs, a task, and a set of scorers. To learn more about evaluations, see the Experiments guide.

Basic Anthropic eval setup

Evaluate the outputs of Anthropic models with Braintrust.
import { Eval } from "braintrust";
import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
});

Eval("Anthropic Evaluation", {
  // An array of user inputs and expected outputs
  data: () => [
    { input: "What is 2+2?", expected: "4" },
    { input: "What is the capital of France?", expected: "Paris" },
  ],
  task: async (input) => {
    // Your Anthropic LLM call
    const response = await client.messages.create({
      model: "claude-sonnet-4-5-20250929",
      max_tokens: 1024,
      messages: [{ role: "user", content: input }],
    });
    return response.content[0].text;
  },
  scores: [
    {
      name: "accuracy",
      // A simple scorer that returns 1 if the output matches the expected output, 0 otherwise
      scorer: (args) => (args.output === args.expected ? 1 : 0),
    },
  ],
});
Learn more about eval data and scorers.

Use Anthropic as an LLM judge

You can use Anthropic models to score the outputs of other AI systems. This example uses the LLMClassifierFromSpec scorer to score the relevance of the outputs of an AI system. Install the autoevals package to use the LLMClassifierFromSpec scorer.
# pnpm
pnpm add autoevals
# npm
npm install autoevals
Create a scorer that uses the LLMClassifierFromSpec scorer to score the relevance of the output. You can then include relevanceScorer as a scorer in your Eval function (see above).
import { LLMClassifierFromSpec } from "autoevals";

const relevanceScorer = LLMClassifierFromSpec("Relevance", {
  choice_scores: { Relevant: 1, Irrelevant: 0 },
  model: "claude-sonnet-4-5-20250929",
  use_cot: true,
});

Additional features

Tool use

Anthropic’s tool use (function calling) is fully supported:
import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });

const tools = [
  {
    name: "get_weather",
    description: "Get current weather for a location",
    input_schema: {
      type: "object",
      properties: {
        location: { type: "string", description: "City name" },
      },
      required: ["location"],
    },
  },
];

const response = await client.messages.create({
  model: "claude-sonnet-4-5-20250929",
  max_tokens: 1024,
  messages: [{ role: "user", content: "What's the weather in San Francisco?" }],
  tools,
});

System prompts

Anthropic models support system prompts for better instruction following.
import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });

const response = await client.messages.create({
  model: "claude-sonnet-4-5-20250929",
  max_tokens: 1024,
  system: "You are a helpful assistant that responds in JSON format.",
  messages: [{ role: "user", content: "What is the capital of France?" }],
});

Cached tokens

Anthropic supports prompt caching to reduce costs and latency for repeated content. When you use prompt caching, Braintrust automatically captures cache read and creation token counts as span metrics. If the API returns a cache creation breakdown (ephemeral 5-minute vs. 1-hour), those are captured as separate metrics too — see the full list in Trace automatically.
import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });

const response = await client.messages.create({
  model: "claude-sonnet-4-5-20250929",
  max_tokens: 1024,
  system: [
    {
      type: "text",
      text: "You are an AI assistant analyzing the following document...",
      cache_control: { type: "ephemeral" },
    },
  ],
  messages: [{ role: "user", content: "Summarize the key points." }],
});

Multimodal content, attachments, errors, and masking sensitive data

To learn more about these topics, check out the customize traces guide.

Use Anthropic with Braintrust gateway

You can also access Anthropic models through the Braintrust gateway, which provides a unified interface for multiple providers. Use any supported provider’s SDK to call Anthropic models.
import { OpenAI } from "openai";

const client = new OpenAI({
  baseURL: "https://gateway.braintrust.dev/v1",
  apiKey: process.env.BRAINTRUST_API_KEY,
});

const response = await client.chat.completions.create({
  model: "claude-sonnet-4-5-20250929",
  messages: [{ role: "user", content: "What is a proxy?" }],
  seed: 1, // A seed activates the proxy's cache
});

Structured outputs

The Braintrust gateway supports structured outputs for Anthropic models.
import { OpenAI } from "openai";
import { z } from "zod";

const client = new OpenAI({
  baseURL: "https://gateway.braintrust.dev/v1",
  apiKey: process.env.BRAINTRUST_API_KEY,
});

// Define a Zod schema for the response
const ResponseSchema = z.object({
  name: z.string(),
  age: z.number(),
});

const completion = await client.beta.chat.completions.parse({
  model: "claude-sonnet-4-5-20250929",
  messages: [
    { role: "system", content: "Extract the person's name and age." },
    { role: "user", content: "My name is John and I'm 30 years old." },
  ],
  response_format: {
    type: "json_schema",
    json_schema: {
      name: "person",
      // The Zod schema for the response
      schema: ResponseSchema,
    },
  },
});