Reinforcement Learning Soccer Bots

This project was the coursework for the Reinforcement Learning (RL) course I took as part of my MSc Artificial Intelligence at the University of Edinburgh. In a mix of several simple stochastic environments and the (deprecated) LARG/HFO half-field offense soccer environment, I implemented eight different reinforcement learning algorithms.

For the following algorithms, I followed pseudocode from Sutton and Barto (2018):

Dynamic Programming: Value iteration (page 83).
Monte Carlo: On-policy first-visit MC control for \(\epsilon\)-soft policies (page 101).
SARSA: On-policy temporal-difference control (page 130).
Q-Learning: Off-policy temporal-difference control (page 131).

The following algorithm was quite difficult because it involved training a single network using multiple parallel agents each playing in a copy of the HFO environment:

Deep Q-Learning: Asynchronous 1-step Q-learning with function approximation, from Mnih et al. (2016); also involved implementing Hogwild parallized stochastic gradient descent from Recht et al. (2011).

Finally, these algorithms were in a two-agent setting:

Independent Q-Learning: The same algorithm as Q-learning, but with a joint state representation for the two agents.
Joint-Action Learning: Based on table 4 from Bowling and Veloso (2001a).
Wolf-PHC: Based on tables 1 and 2 from Bowling and Veloso (2001b).

I built my implementation on top of this provided base code, but I can't share my own code. Most of my implementations reached the highest performance threshold defined by the coursework markers.