Astral

Research

Papers, technical reports, and write-ups from the Astral autonomy team — ordered by date.

After the domain detector, we trained three more models in a single session: a VLM action-LoRA that cuts malformed commands, a 121 KB reactive policy MLP that runs at 200 Hz, and a monocular depth fine-tune for rangefinding beyond stereo baseline. All four are now running on Jetson Orin Nano hardware.

Read

COCO-80 has no drone class, no person_aerial, no landing pad. We trained a 9-class domain detector on 48,000 images of sim and real aerial footage — and learned why class imbalance is the dominant failure mode in aerial perception.

Read

Technical report. YOLOv8n fine-tuned on a 9-class aerial schema across three training rounds: sim-only (v1), merged with VisDrone (v2), and 4× drone oversampling (v3). mAP50 0.471 → 0.376 → 0.384. Drone AP50 0.047 → 0.010 → 0.087.

Read

An eighteen-iteration engineering log: improving a modular autonomy stack in aggregate, scaling detector fine-tuning with large synthetic data, diagnosing a cross-simulator localization gap, and characterizing exploration and planning as the next bottlenecks.

Controlled swarm simulations up to 1,000 agents comparing sensing stacks and coordination architectures, with a focus on when ultra-wideband ranging becomes necessary as fleet scale and environment difficulty increase.

Adds Gemma 4 to the same Isaac Sim closed-loop benchmark and compares end-to-end goal prediction against modular deployment of the same weights as a semantic target selector, illustrating the leverage of the separation principle.

11,340 seeded trials across four attack classes (GNSS spoofing, RF jamming, kinetic interception, control takeover) and six matched defenses in a four-drone warehouse swarm. Central finding: mission success rate is the wrong primary metric for C-UAS — physical effects (79.5% PN capture rate, 5–8 m position error) are clearly measurable even when aggregate task completion is unaffected. A kinematic plausibility detector achieves 39.8% TP at 0% false-positive rate. Includes an explicit fidelity boundary analysis delineating what kinematic simulation can and cannot faithfully reproduce.

Nine-cell factorial study comparing tower vs. self-organized coordination, continuous vs. terminal-only communications, and four observation modalities (ADS-B, camera, both, none) across 405 simulated vertiport trials. Self-org with ADS-B matches tower throughput below ~20 ops/hour then degrades; silent-cruise drones exceed safe LoS thresholds at 12 ops/hour. Characterizes the throughput–safety Pareto frontier and broadcast necessity threshold for UAM droneport designs.

Full autonomy gets 57.6% on hard warehouse tasks. Add a human for novel situations, and it jumps to 94.4%. The right architecture isn't fully autonomous — it's autonomy-aware.

Read
Paper2026-05-02

NeurIPS 2026 Datasets & Benchmarks track (submission)

Introduces Yonder, a multi-million-frame drone-perspective indoor dataset with rich sensing, and shows why offline detection gains can fail to translate to closed-loop navigation when training and evaluation simulators disagree geometrically.

Large-scale closed-loop benchmark across many VLMs, decomposing failures into semantic understanding versus metric spatial grounding, and a modular architecture that closes the gap on operational commands while prioritizing collision-free flight.

A practical stack for AI drone autonomy: simulation-first iteration, metric grounding, modular perception and planning, and closed-loop evaluation.

Read