Posted July 15, 2025 by Shivani
Before I started building a custom PPO configuration, I ran into a wall.
I had just expanded the drone's environment, adding a battery system, a charging station, and increasing the environment size. It seemed like everything was in place. The agent could fly, detect objects, track its battery and even earn shaped rewards based on behaviour.
But when trained, it learnt poorly and inconsistently.
I was still using the default PPO configuration that comes with Unity ML-Agents. While it works for simple environments, my updated setup was now too complex. Here's what I observed during training:
This is a classic case of underpowered training settings for a task that needs more exploration, stability, and feedback.
First, I downloaded a template YAML file [1] from the Coder One website. This is a good starting point, as now I can just tweak the parameters that will help my agent learn better. I reviewed the ML Agents documentation [2] for the description of the configurations. This helped me learn what settings I should change to improve the agents' learning.
Using the template, I started tweaking and adding the following:
hyperparameters: batch_size: 1024 buffer_size: 10240
hyperparameters: learning_rate: 3.0e-4 learning_rate_schedule: linear
hyperparameters: beta: 2.5e-4 epsilon: 0.2
reward_signals: curiosity: strength: 0.1 gamma: 0.99 learning_rate: 0.0003
network_settings: normalize: true hidden_units: 256 num_layers: 2
max_steps: 10000000 time_horizon: 128 summary_freq: 10000
Training an agent for a more complex world requires more thoughtful training. The default PPO setup is a good starting point, but it isn't as effective for bigger environments. Creating a custom config gives me more control and better results, especially as I add more interactivity to the environment.
[1] https://www.gocoder.one/blog/training-agents-using-ppo-with-unity-ml-agents/
[2] https://github.com/Unity-Technologies/ml-agents/blob/release_18_docs/docs/Traini...