Yonder: A Large-Scale Drone Navigation Dataset and Why Offline mAP Lies to You

Yonder is a large drone-perspective dataset for indoor navigation research. It is built to support serious perception training (detection, depth, semantics) and to make a specific evaluation failure mode obvious: offline metrics on one simulator can mis-rank models for closed-loop flight in another.

What is inside

The public release includes millions of frames across many indoor environments, with rich sensor arrays per waypoint (stereo RGB, depth, LiDAR-style sweeps, semantics, pose). Full details and layout are on the Hugging Face dataset card; start with the smoke subset astralhf/yonder-sample if you want a small download before committing to large transfers.

What Yonder is for (and not for)

Great for: training and studying drone-perspective perception, and diagnosing cross-simulator generalization when paired with a closed-loop evaluator.
Not a substitute for: end-to-end policy training from expert trajectories; it is not packaged as behavior cloning data with full closed-loop rollouts.

Why this matters for AI drone software

The field has a habit of celebrating offline detection gains. Yonder includes the ingredients to show when those gains are real for flight and when they are an artifact of simulator-specific geometry and rendering conventions. If you care about trustworthy autonomy, publish both: offline metrics and closed-loop outcomes.

Dataset hub: https://huggingface.co/datasets/astralhf/yonder