From the makers of Foresight-32B

Generate verified datasets at scale.

Quality data is the biggest blocker for most LLM projects. LightningRod makes it easy to generate, transform, and verify datasets grounded in real sources—in just a few lines of Python.

Request Access

generate_dataset.py

import lightningrod as lr

# Get antitrust news to train a domain expert
seeds = lr.NewsSeedGenerator(
    query="antitrust investigation",
    start_date="2025-01-01"
)

# Define the scope and style of the questions
questioner = lr.QuestionGenerator(
    instructions="Write forward-looking, self-contained questions with explicit dates/entities.",
    examples=[
        "What is the likely outcome of the DOJ lawsuit?",
        "Which specific Sherman Act violations are cited?"
    ]
)

# Verify answers against live sources
labeler = lr.WebSearchLabeler()

# Run pipeline
pipeline = lr.Pipeline(seeds, questioner, labeler)
dataset = pipeline.batch(100)

Built For

SFT Training RL Training RAG Evaluation Model Benchmarking

Generate verified datasets at scale.

Trusted By

Why Lightning Rod?

Grounded, Not Hallucinated

No Data? No Problem

Total Provenance

Reproducible Pipelines