Skip to main content

Indie game storeFree gamesFun gamesHorror games
Game developmentAssetsComics
SalesBundles
Jobs
TagsGame Engines

Machine Learning - Self Driving Car AI


The start


This project began by following a series of tutorials from Code Monkey on machine learning fundamentals in Unity, including:

  • Basic ML-Agents setup
  • Car driving agent behavior
  • Imitation learning integration

Using these as a foundation, I created:

  • A working test scene
  • A virtual training environment

The car controller I found off of the internet: Prometeo Car Controller

The point of this project was to train an AI model how to drive a car using Machine Learning. I would use the metrics of how many generations of training it would take to train a model to drive around a course, how it sought after maximizing reward and minimizing penalty, as well as the episode length. I would create a right turn, left turn and finally random turn course. The model will be rewarded for passing correct checkpoints (moving forwards), and penalized for colliding with walls and passing through wrong checkpoints.

Following all of the tutorials from Code Monkey, I was able to set up my Car Controller and have it train driving forward and turning.


First Tests & Training


The first training started a bit rough, the AI was constantly having problems because of my setup with the Car Controller. I had to readjust it and once it was fixed, the training for turning right started.

Because the training was taking so long, I've decided to a faster approach and use Imitation learning. Upon creating a .yaml file and recording a demo and trying it, I've come to realize that the Car Controller had issues with the new learning method. There were too many actions for the Car Controller to take that it almost always ended up doing nothing. Following this I created a new Car Controller and started the first actual batch of training.


Model Training

After implementing the new Car Controller, the imitation learning finally started to show its capabilities.

Turn Right

Within 3 different trainings, I had to change up a bit of the reward/penalty system for the training to start making good decisions and making them faster.

Figure 1: Turn Right Reward Metric rewardRight

Figure 2: Turn Right Episode Length episodeRight

The Purple line was the 3rd and final training that was performed, the Pink line being the first one. There was a massive difference between the two tests as the AI was trying out small movements in the beginning, but started learning extremely fast after the reward/penalty system was tweaked.

Turn Left

Surprisingly, the turn left training took only 1 session. The reward/penalty system was working good, and with demo it was given, training took only 1 try and the AI was able to finish the entire course.

Figure 3: Turn Left Reward Metric rewardLeft

Figure 4: Turn Left Episode Length episodeLeft

Interestingly enough, the episode length started increasing after a bit. This was due to not putting an Episode Reset after completing the entire course, but allowing the AI to either crash to end its episode, or run out of steps.


Final Track

Finally getting to the Final Track, I had my hopes up but they were entirely crushed within the first 2 trainings. The AI had so many issues with the reward/penalty system that it would either get stuck in place and not move, or slam directly into a wall as fast as possible to minimize penalty.

Problem and Fix

The problem was the reward system, so I spent ages tweaking it to allow for the AI to actually move from its place, and to not start randomly ramming into walls the moment they can. I had to put penalties on turning and max speed (which in the end weren't necessary), staying in place for too long, and also reward it for getting closer to a checkpoint to incentivize it to move. One of the breakthroughs was also turning off the Imitation learning for a few training sessions for the AI to explore by itself, instead of trying to follow my tracks. Turning it on back in the end was what ended up allowing the AI to finish the track.

Findings

After extensive tuning and tweaking, the process was finally finished, however the graphs here show a very interesting observation. Although it is entirely unreadable, the lines above the 0 reward axis are all actually first tests. They have such high reward because my system was based on rewarding it and not punishing it. The final and best model was the bright green line under the 0 reward axis.

There are 2 reasons for this, one, I switched from extensively rewarding the AI, to heavily punishing it (mostly because it kept ramming the wall). Reason two is, the AI is constantly trying to find the way that minimizes its penalties and maximizes its rewards, but because the rewards were so little, it had to focus on minimizing penalties (by not colliding into walls and moving backwards) and with that it naturally started passing through checkpoints and reaching the end of the track

Figure 5: Final Track Reward Metric rewardFinal

Figure 6: Final Track Episode Length episodeFinal


Improvements

To talk about improvements, the worst mistake I made was the amount of time I dedicated towards this as training AI takes a LONG time to do. Second, there were some things I could've done to make the testing faster, which I did in the end - placing everything all of the models in one environment and save on performance. I noticed every time I started training, the Unity would constantly lag and take more time than necessary. Another point I wanted to add was a tool which you can use to create tracks, so the checkpoints and walls would come with the tracks to make testing new environments, as well as making a potential car game possible and easy to do.


Justification

Why making all of this in the first place? Firstly, I was very interested in what machine learning can do within Unity and there are many fun use cases. Secondly, many game devs today (as well as real world technology such as self driving cars IRL) use machine learning to improve their games. One of the more popular ones was the Nemesis system (The Gamer - The Nemesis System) which uses machine learning to monitor the player and adjust the world and enemies to the player. The enemies remember what you did and that creates an entirely different experience for each player. Unfortunately this system was patented, but there are more examples, such as Hello Neighbor using machine learning to predict your movements and catch you off guard (IGN - Hello Neighbor).


Wrap-up

Throughout this project, I followed most of Code Monkey's tutorials as well as tweaked and implemented different reward/penalty systems of my own. I trained a car to navigate left and right turns, as well as a random track.

After several attempts and reward system tweaks (including imitation learning with 5 demos), the AI was finally able to finish the final track. I continuously adjusted the observations and introduced different kinds of penalties. The process is shown by graphs showing training progression, and all iterations are documented via GitHub commits.


You can also find this project on my Github:

Github - MLAdvancedTools

Support this post

Did you like this post? Tell us

Leave a comment

Log in with your itch.io account to leave a comment.