Devlog #7 - Training My Drone: Curriculum Learning for Search Tasks

A downloadable game

Posted July 15, 2025 by Shivani

Check out my training video which shows the process of training a drone using curriculum.

Training a drone in a complex environment isn't easy, especially when you introduce mechanics like batteries, recharging and searching. The default training method wasn't producing reliable results, so I decided to go a step further:

I built a 3-stage curriculum to gradually teach my drone how to recharge and complete its tasks efficiently.

This devlog dives into how I implemented curriculum learning using Unity ML-Agents, what each stage teaches the agent, and how it helped speed up learning.

What is Curriculum Learning?

Curriculum learning is a machine learning technique where the agent starts with simpler tasks and then gradually progresses to more challenging scenarios as it gets better. It's a good representation of how humans learn.

Using ML-Agents, this is handled through environment parameters that change across "lessons" based on the agent's performance.

Curriculum Structure

I created a curriculum with three stages, each increasing in complexity:

Lesson Breakdown

Each lesson is tied to a specific task:

Lesson 0 - Collect Targets

The first lesson introduces only the target.

case CurriculumStage.TargetOnly:
    // Activate target                 
    targetTransform.gameObject.SetActive(true);                 
    chargingStationTransform.gameObject.SetActive(false);
                  
    // Reset battery to 100% each episode                 
    battery.ResetBattery();     
             
    // Position agent and target                 
    transform.localPosition = new Vector3(0, 3, 0);                 
    targetTransform.localPosition = new Vector3(Random.Range(-5, 5), 
    1f, 
    Random.Range(-5, 5)); // Smaller range

Battery stays full on each episode.
The agent's task is to locate and collect the target in a small radius.
Simple reward: Target = +5.

This isolates searching behaviour and educates the agent on what its main task is without any extra distractions.

Lesson 1 - Recharge the Battery

Once the drone can reliably collect targets, we introduce the battery mechanic.

case CurriculumStage.ChargingOnly:
     
    // Activate charging station                 
    targetTransform.gameObject.SetActive(false);                 
    chargingStationTransform.gameObject.SetActive(true);  
                
    // Randomise the battery to a low percent each episode                 
    battery.SetBattery(Random.Range(15f, 30f));   
                
    // Position agent                 
    transform.localPosition = new Vector3(Random.Range(-18, -10), 3f, Random.Range(-5, 5)); // Smaller range

Starts with a low battery.
Must find the charging station before battery depletion.
Teaches the drone how to manage its battery level.

This stage is more about teaching the drone where the charging station is and how to charge itself.

Lesson 2 - Full Training

Finally, we combine all components in the environment.

case CurriculumStage.FullTraining:
          
    // Activate target and charging station                 
    targetTransform.gameObject.SetActive(true);                 
    chargingStationTransform.gameObject.SetActive(true);   
              
     // Position agent and target                 
    transform.localPosition = new Vector3(0, 3, 0);                 
    targetTransform.localPosition = 
    new Vector3(Random.Range(-15, 15), 1f, 
    Random.Range(-8, 8)); // Larger Range

Find the target.
Recharge when needed.
The target is spawned in a larger radius.

By this point, the agent has already mastered the foundational skills, so it can now focus on efficiency and strategy.

How the Agent Knows What Lesson It's In

At the beginning of each episode, the agent reads the stage environment parameter:

This allows the training environment to dynamically change based on the lesson.

Evaluation

To evaluate my curriculum-based training, I tracked metrics such as average reward, episode length, success rate and convergence speed. Each lesson was monitored individually to observe how the agent's performance progressed as task complexity increased. This allowed me to quantitatively compare the curriculum approach to previous training runs and fine-tune hyperparameters where necessary.

Lesson	Avg. Reward	Episode Length	Reward Std. Dev.
Lesson 0: Target	x	x	x
Lesson 1: Recharge	x	x	x
Lesson 2: Full	x	x	x

Next Steps

In the next phase, a new drone, the "delivery drone", will be introduced to work alongside the existing search drone that has been trained. These two drones will collaborate to complete tasks more efficiently.

As explained in my first devlog, the search drone will be responsible for locating the target and recording its precise coordinates. It will then share the location data with the delivery drone, which will use the information to navigate to the site and carry out its assigned operation.

Devlog #7 - Training My Drone: Curriculum Learning for Search Tasks

What is Curriculum Learning?

Curriculum Structure

Lesson Breakdown

Lesson 0 - Collect Targets

Lesson 1 - Recharge the Battery

Lesson 2 - Full Training

How the Agent Knows What Lesson It's In

Evaluation

Next Steps

More posts

Devlog #8 - Evaluating Trained Models

Devlog #6 - Training My Drone: Custom PPO Configuration

Devlog #5 - Training My Drone: Battery Awareness

Devlog #4 - Training My Drone: Target Collection Basics

Devlog #3 - Training My Drone: ML-Agents Setup