Astral
Back to research

Technical report

Engineering the Separation Principle: From Modular Architecture to Deployable Drone Navigation

TL;DR

  • An 18-iteration engineering log of a modular drone autonomy stack.
  • Detector fine-tuning on 6.7M synthetic frames improved detection mAP 9.7× (4.8% → 46.7%) — but closed-loop navigation did not improve.
  • A cross-simulator localization gap, not detection accuracy, is the binding constraint.
Iterations18
Training frames6.7M
Detection mAP4.8% → 46.7% (9.7×)

Abstract

An eighteen-iteration engineering log: improving a modular autonomy stack in aggregate, scaling detector fine-tuning with large synthetic data, diagnosing a cross-simulator localization gap, and characterizing exploration and planning as the next bottlenecks.

This paper is an engineering log, not a polished result. It records 18 iterations of a modular drone autonomy stack, including the failures — most of them, in detail. We publish it because the failures are more instructive than the successes.

The central finding

Fine-tuning a YOLOv8n detector on 6.7 million synthetic frames from Isaac Sim improved detection mAP 9.7× — from 4.8% to 46.7%. This is a large, unambiguous gain on the offline metric. Closed-loop navigation success did not improve. On the same set of navigation trials, success rates before and after fine-tuning were statistically indistinguishable.

That result forces a conclusion: detection accuracy was not the bottleneck. Something else was holding navigation back. We spent several iterations diagnosing it before isolating a cross-simulator localization gap — the training simulator and the evaluation simulator disagree on enough geometric details (object scale, floor reflectance, lighting model, corridor dimensions) that depth estimates computed in the evaluation environment are systematically wrong relative to the depth distribution the planner was trained to expect.

Iteration structure

The 18 iterations span three phases:

  1. Baseline establishment (iterations 1–4): standing up the full perception–planning–control loop, verifying that commands reach the flight controller, establishing the hover baseline as the comparison point.
  2. Detector scaling (iterations 5–12): data collection pipeline, synthetic frame generation at scale, fine-tuning protocol, offline evaluation, and the closed-loop non-result.
  3. Gap diagnosis (iterations 13–18): systematic probes of the localization gap, partial mitigations, and characterization of exploration and planning as the next bottlenecks now that detection is no longer binding.

Why publish failure logs

Drone autonomy papers almost universally report final results on favorable conditions. The engineering decisions that were tried and discarded — and especially the diagnostic work that preceded those decisions — rarely appear in print. That creates a literature where every paper shows an improvement, and a practitioner reading it has no idea how many expensive dead ends were omitted.

We think the iteration log format serves the community better. If you are working on a similar stack and your fine-tuning isn't transferring to closed loop, this paper tells you what we checked, in what order, and what the localization gap diagnosis looks like.

The dataset used to generate training frames is Yonder. The benchmark that produced the 25-VLM results referenced here is documented in Closing the Metric Gap.