- Versioned: Every change is tracked, so experiments can pin to specific versions.
- Integrated: Use directly in evaluations and populate from production.
- Scalable: Stored in a modern data warehouse without storage limits.
Dataset structure
Each record has four top-level fields:- input: Data to recreate the example in your application (required).
- expected: Ideal output or ground truth (optional but recommended for evaluation).
- metadata: Key-value pairs for filtering and grouping (optional).
- tags: Labels for organizing and filtering records (optional).
Where to go from here
- Create datasets from uploads, the SDK, production logs, user feedback, traces, or Loop.
- Build dataset pipelines to transform project logs into dataset rows in bulk.
- Manage datasets — tag and star, save snapshots, define schemas, customize table views, and edit records.
- Use in evaluations by passing datasets to
Eval(), assigning them to environments, or converting experiment results. - Track performance to see which experiments used a dataset and how each row performs.