Taking the Human Out of the Loop

We’ve had autosuggest for a while, but Large Language Models (LLMs) became possible because of the explosive grown of available text, and a 2017 algorithm that parallelizes LLM training, if you have enough GPUs. One intriguing byproduct of LLMs is their ability to reason; we are still learning how to tease out intelligence from the corpus of human-generated text.

But what if we take the human out of the loop and train an LLM based on software-generated data? Early LLMs were trained on – God help us – Reddit comments, but one can train instead on inter-robot communication that has nothing in common with human speech. The benefit is that you can use a smaller and more accurate model – as one of the agents – if you focus on what the robots need to know and how to talk to each other. Not every model in your system needs to know the airspeed velocity of an unladen swallow.

For a simple example of taking humans out of the loop in LLM training, consider a scenario where two robots work together to move an object:

The object is a segment that can be moved left or right.
To move the segment, one robot goes to each end, and then the robots move left or right at the same time.
The task is done when the segment is at the desired location.

Of course, one does not need an LLM predicting tokens to tell the robots how to move in this case. A simple algorithm will do. At each step, it tells each robot to move right, or move left, or stay, until the segment has arrived. But then one can run this algorithm on thousands of different input values, recording all the while what the algorithm input – and then we have a corpus of software data.

And that’s what we did! We generated robot commands for many scenarios and trained an LLM (well, technically a “decoder”) on the resulting robot-communication. And it works!

If the problem is complex enough, you won’t want to make the algorithm yourself, but instead let the robots organically develop their own language for this task by Reinforcement Learning, though a lot of “No, my other left” trial and error. This of course requires a simulator to reduce training time and crashed robots.