Leon Overweel

Reinforcement Learning Soccer Bots

This project was the coursework for the Reinforcement Learning (RL) course I took as part of my MSc Artificial Intelligence at the University of Edinburgh. In a mix of several simple stochastic environments and the (deprecated) LARG/HFO half-field offense soccer environment, I implemented eight different reinforcement learning algorithms.

For the following algorithms, I followed pseudocode from Sutton and Barto (2018):

  • Dynamic Programming: Value iteration (page 83).
  • Monte Carlo: On-policy first-visit MC control for \(\epsilon\)-soft policies (page 101).
  • SARSA: On-policy temporal-difference control (page 130).
  • Q-Learning: Off-policy temporal-difference control (page 131).

The following algorithm was quite difficult because it involved training a single network using multiple parallel agents each playing in a copy of the HFO environment:

  • Deep Q-Learning: Asynchronous 1-step Q-learning with function approximation, from Mnih et al. (2016); also involved implementing Hogwild parallized stochastic gradient descent from Recht et al. (2011).

Finally, these algorithms were in a two-agent setting:

I built my implementation on top of this provided base code, but I can't share my own code. Most of my implementations reached the highest performance threshold defined by the coursework markers.