Q learning visualization

Speed :

Choose reward cell