Papers, articles, and other material from which we have learned and used in development
“Policy Gradient” Article by Adrien Lucas Ecoffet on the math behind the loss and update rule for policy gradient.
“Policy Gradient” Article by Sanyam Kapoor, another helpful explanation of the math of policy gradient algorithms.
“Policy Gradient and Actor Critic” A fantastic lesson by Hado Van Hasselt of Deepmind on all things Actor Critic and RL.
“PyTorch Actor Critic” By Github user hermesdt: Code we followed to create our own variants of Actor Critic methods with OpenAI gym.
“Trust Region Policy Optimization” By Jonathan Hui: an intuitive explanation of Trust Region Policy Optimization.
“A Thousand Brains: A New Theory of Intelligence” By Jeff Hawkins of Numenta, explaining his theory of cognition and cortical columns.
“Models of the Mind” By Grace Lindsay, a tour of the mathematical models most useful to computational neuroscience when describing the brain.
More research and resources forthcoming.