Posted July 08, 2025 by Shivani
After completing the initial version of the drone agent and successfully training it to collect a target, I wanted to push the simulation further. The goal this time was to increase complexity and make it more realistic.
One of the major additions in this phase is the battery system. The drone now has a limited amount of energy drains with every actions it takes. This introduces a new challenge: the agent must not only seek out the target, but also manage its energy efficiently to avoid running out of battery mid-task.
Here's how it works:
To keep things modular and accessible, I created public properties for battery stats:
This allows the agent scripts to read battery values without exposing internal logic.
The battery is drained inside the OnActionReceived() method of the agent.
Every action the agent takes now comes at a cost. This simple change added a whole new layer of strategy to the agent's behaviour.
In addition to the battery system, I also expanded the training area. The original environment was small and highly predictable.
In this update:
As the environment becomes more complex, so must the agent's awareness of itself and its surroundings. In this phase, I updated the vector observations to reflect the new battery mechanic and environmental features. These observations are crucial for helping the agent make informed decisions, especially when battery management has been added.
In addition to the base observations, I set up in the ML-Agents Setup Devlog (such as position and velocity), the drone now gathers battery-aware observations.
By feeding the agent both quantitative data (exact battery level and distance to the charger) and qualitative flags (whether the battery is low), the agent is now better equipped to:
With new mechanics like the battery system and charging station in place, the next challenge was to guide the agent's learning through rewards. To encourage smart battery management, I implemented a combination of distance-based and collision-based reward shaping techniques.
When the drone's battery is low, I want it to move towards the charging station. To help it learn this behaviour, I wrote a function called GoToCharger(), which is called in the OnActionReceived() method. The code below rewards the agent for reducing the distance to the charger.
This subtle, shaped reward is only triggered when the battery drops below 40%. If the drone gets closer to the charger, it earns a small positive reward, reinforcing the idea that closing the gap is a good thing.
Getting near the charger is great, but reaching it and recharging deserves a bigger reward. That's where the HandleChargingStationCollision() functions comes in:
In this setup:
In addition to teaching the drone how to manage its energy, I also updated the reward system for successfully reaching the target. The new logic now factors in how efficiently the task was completed.
Here's the logic behind the HandleTargetCollision() function:
Let's break it down:
The drone agent will now be capable of reaching its goal. By expanding the environment, adding a battery mechanic and shaping rewards around battery level and goal completion, the agent can now learn to balance priorities dynamically.
To push this further, here's what I have planned for the next phase: