Devlog #8 - Evaluating Trained Models

A downloadable game

Posted July 21, 2025 by Shivani

Parameter	Default Model	SearchAgent (Custom)	Description
trainer_type	ppo	ppo	Same algorithm used
max_steps	500,000	3,000,000	Extended training for more learning and convergence
summary_freq	50,000	10,000	More frequent summaries for closer monitoring

HYPERPARAMETERS	________________	________________	________________
learning_rate	3e-4	3e-4	No change
batch_size	1024	1024	No change
buffer_size	10,240	10,240	No change
beta	not set	2.5e-4	Regularisation to reduce policy entropy
epsilon	not set	0.2	PPO clipping parameter to control policy updates
lambd	not set	0.95	GAE lambda for bias-variance trade-off in advantage estimation
num_epoch	not set	3	Number of passes over data per policy update
learning_rate_schedule	linear	linear	Gradual learning rate decay over training

NETWORK SETTINGS	________________	________________	________________
hidden_units	128	256	Larger network capacity to learn more complex features
num_layers	2	2	Same depth for balance between expressiveness and speed
normalize	false	true	Normalise inputs to stabilise and speed training

REWARD SIGNALS	________________	________________	________________
EXTRINSIC	________________	________________	________________
gamma	0.99	0.99	No change
strength	1.0	1.0	No change
CURIOSITY	________________	________________	________________
strength	not set	0.1	Added curiosity for intrinsic motivation/exploration
gamma	not set	0.99	Discount factor for curiosity rewards
learning_rate	not set	0.0003	Learning rate specific to curiosity module

More posts