Research
Papers, technical reports, and write-ups from the Astral autonomy team — ordered by date.
After the domain detector, we trained three more models in a single session: a VLM action-LoRA that cuts malformed commands, a 121 KB reactive policy MLP that runs at 200 Hz, and a monocular depth fine-tune for rangefinding beyond stereo baseline. All four are now running on Jetson Orin Nano hardware.
ReadCOCO-80 has no drone class, no person_aerial, no landing pad. We trained a 9-class domain detector on 48,000 images of sim and real aerial footage — and learned why class imbalance is the dominant failure mode in aerial perception.
ReadTechnical report. YOLOv8n fine-tuned on a 9-class aerial schema across three training rounds: sim-only (v1), merged with VisDrone (v2), and 4× drone oversampling (v3). mAP50 0.471 → 0.376 → 0.384. Drone AP50 0.047 → 0.010 → 0.087.
ReadTechnical report
An eighteen-iteration engineering log: improving a modular autonomy stack in aggregate, scaling detector fine-tuning with large synthetic data, diagnosing a cross-simulator localization gap, and characterizing exploration and planning as the next bottlenecks.
Technical report
Controlled swarm simulations up to 1,000 agents comparing sensing stacks and coordination architectures, with a focus on when ultra-wideband ranging becomes necessary as fleet scale and environment difficulty increase.
Technical note
Adds Gemma 4 to the same Isaac Sim closed-loop benchmark and compares end-to-end goal prediction against modular deployment of the same weights as a semantic target selector, illustrating the leverage of the separation principle.
Technical report
11,340 seeded trials across four attack classes (GNSS spoofing, RF jamming, kinetic interception, control takeover) and six matched defenses in a four-drone warehouse swarm. Central finding: mission success rate is the wrong primary metric for C-UAS — physical effects (79.5% PN capture rate, 5–8 m position error) are clearly measurable even when aggregate task completion is unaffected. A kinematic plausibility detector achieves 39.8% TP at 0% false-positive rate. Includes an explicit fidelity boundary analysis delineating what kinematic simulation can and cannot faithfully reproduce.
Technical report
Nine-cell factorial study comparing tower vs. self-organized coordination, continuous vs. terminal-only communications, and four observation modalities (ADS-B, camera, both, none) across 405 simulated vertiport trials. Self-org with ADS-B matches tower throughput below ~20 ops/hour then degrades; silent-cruise drones exceed safe LoS thresholds at 12 ops/hour. Characterizes the throughput–safety Pareto frontier and broadcast necessity threshold for UAM droneport designs.
Full autonomy gets 57.6% on hard warehouse tasks. Add a human for novel situations, and it jumps to 94.4%. The right architecture isn't fully autonomous — it's autonomy-aware.
ReadNeurIPS 2026 Datasets & Benchmarks track (submission)
Introduces Yonder, a multi-million-frame drone-perspective indoor dataset with rich sensing, and shows why offline detection gains can fail to translate to closed-loop navigation when training and evaluation simulators disagree geometrically.
Technical report
Large-scale closed-loop benchmark across many VLMs, decomposing failures into semantic understanding versus metric spatial grounding, and a modular architecture that closes the gap on operational commands while prioritizing collision-free flight.
A practical stack for AI drone autonomy: simulation-first iteration, metric grounding, modular perception and planning, and closed-loop evaluation.
Read