Posted July 15, 2025 by Shivani
Check out my training video which shows the process of training a drone using curriculum.
Training a drone in a complex environment isn't easy, especially when you introduce mechanics like batteries, recharging and searching. The default training method wasn't producing reliable results, so I decided to go a step further:
This devlog dives into how I implemented curriculum learning using Unity ML-Agents, what each stage teaches the agent, and how it helped speed up learning.
Curriculum learning is a machine learning technique where the agent starts with simpler tasks and then gradually progresses to more challenging scenarios as it gets better. It's a good representation of how humans learn.
Using ML-Agents, this is handled through environment parameters that change across "lessons" based on the agent's performance.
I created a curriculum with three stages, each increasing in complexity:
Each lesson is tied to a specific task:
The first lesson introduces only the target.
case CurriculumStage.TargetOnly: // Activate target targetTransform.gameObject.SetActive(true); chargingStationTransform.gameObject.SetActive(false); // Reset battery to 100% each episode battery.ResetBattery(); // Position agent and target transform.localPosition = new Vector3(0, 3, 0); targetTransform.localPosition = new Vector3(Random.Range(-5, 5), 1f, Random.Range(-5, 5)); // Smaller range
This isolates searching behaviour and educates the agent on what its main task is without any extra distractions.
Once the drone can reliably collect targets, we introduce the battery mechanic.
case CurriculumStage.ChargingOnly: // Activate charging station targetTransform.gameObject.SetActive(false); chargingStationTransform.gameObject.SetActive(true); // Randomise the battery to a low percent each episode battery.SetBattery(Random.Range(15f, 30f)); // Position agent transform.localPosition = new Vector3(Random.Range(-18, -10), 3f, Random.Range(-5, 5)); // Smaller range
This stage is more about teaching the drone where the charging station is and how to charge itself.
Finally, we combine all components in the environment.
case CurriculumStage.FullTraining: // Activate target and charging station targetTransform.gameObject.SetActive(true); chargingStationTransform.gameObject.SetActive(true); // Position agent and target transform.localPosition = new Vector3(0, 3, 0); targetTransform.localPosition = new Vector3(Random.Range(-15, 15), 1f, Random.Range(-8, 8)); // Larger Range
By this point, the agent has already mastered the foundational skills, so it can now focus on efficiency and strategy.
At the beginning of each episode, the agent reads the stage environment parameter:
This allows the training environment to dynamically change based on the lesson.
To evaluate my curriculum-based training, I tracked metrics such as average reward, episode length, success rate and convergence speed. Each lesson was monitored individually to observe how the agent's performance progressed as task complexity increased. This allowed me to quantitatively compare the curriculum approach to previous training runs and fine-tune hyperparameters where necessary.
Lesson | Avg. Reward | Episode Length | Reward Std. Dev. |
Lesson 0: Target | x | x | x |
Lesson 1: Recharge | x | x | x |
Lesson 2: Full | x | x | x |
In the next phase, a new drone, the "delivery drone", will be introduced to work alongside the existing search drone that has been trained. These two drones will collaborate to complete tasks more efficiently.
As explained in my first devlog, the search drone will be responsible for locating the target and recording its precise coordinates. It will then share the location data with the delivery drone, which will use the information to navigate to the site and carry out its assigned operation.