Reinforcement Learning with Policy Gradients: A TensorFlow Implementation of “Pong from Pixels”

Andrej Karpathy wrote a great post last year on how to train a neural network to play the Atari game Pong by using the Policy Gradients reinforcement learning (RL) algorithm. Given the game’s state as input, the neural network outputs a probability with which we should move the Pong paddle up or down.

I converted Karpathy’s NumPy-only approach to TensorFlow inside a Jupyter notebook. I also created a class to represent the agent playing the game–I stuck all of the code to run the Pong simulation inside that class. Here’s the Github gist, which is best viewed by clicking the link below the embedding ūüôā

Here’s a short GIF of some gameplay. The neural-network agent is on the right, and the built-in AI is on the left.


After ~3,000 parameter updates, the Pong-playing neural network can beat the built-in AI more often than not. What’s interesting to me is that this network looks simpler than one that you’d use for MNIST, and it doesn’t require data with labels to learn!



Tackling the Monty Hall Problem with a Monte Carlo Simulation in Python

You probably know what the Monty Hall problem is, so I will skip its description and direct those unfamiliar with the problem to the Monty Hall problem’s Wikipedia page. This post is about a way to use a Monte Carlo simulation to convince yourself or another that you should indeed switch doors when playing the game. First, it should be noted that a simple application of Bayes Theorem can illustrate why you should switch doors. Also, you can simply talk through how the game plays out under the two main strategies (always switching vs. never switching) to discover which strategy is superior–for example, consider that when you always switch, and the car is behind door 3, you will win when you select doors 1 and 2 but will not win when you select door 3 (win rate of 2/3).

In any case, a third way to discover the optimal game-playing strategy is a Monte Carlo simulation. Such simulations are useful in a variety of contexts (from nuclear physics to economics) and can be simple enough to program up in under an hour. For the Monty Hall problem, a simulation can do the following and be successful:

  1. Ask the user to choose a switching strategy.
  2. For a large number (N) of iterations, artificially play the game using the user-selected strategy, and random numbers to determine the game’s outcome:
    1. Place the car behind a random door, call this door A.
    2. Make the game-show contestant choose a random door, call this door B (it’s possible that B=A).
    3. Make the game-show host reveal a random door that is not A and is not B, call this door C. In other words, we keep assigning random doors to C until it’s true that C!=A, and C!=B.
    4. If the strategy is to always switch, then make the game-show contestant choose a new door D such that D!=B and D!=C.
    5. If the strategy is to never switch, set D=B.
    6. If D==A, add 1 to the number of successes.
  3. Calculate the strategy’s win rate as the number of successes divided by N.

I wrote a program that carries out such a simulation and reveals which strategy is optimal for the Monty Hall problem. You can find that code here: Note that the Monte Carlo method has 1/sqrt(N) convergence. This means that you can increase N from 10,000 to 1,000,000, a factor of 100, and the error will be reduced to 10% of its prior value. This is a relatively slow convergence rate as far as numerical methods go. So you can increase N if you want a more precise result, but remember that the program takes time to run, and this time is at least linearly related to N.

If you needed this post to convince you to always switch, you are not alone. According to the Monty Hall problem Wikipedia page linked above, the famously prolific mathematician Paul ErdŇĎs didn’t believe the solution to the Monty Hall problem was “always switch” until he saw a simulation like the one I just described.

Artificial Neural Network in Python

My research group has been discussing Artificial Neuron-Glia Networks lately. These algorithms add artificial astrocytes to the traditional Artificial Neural Network scheme, and they may also feature a Genetic Algorithm in lieu of back-propagation. See for an example.

To better understand the implementation of a neural net, I constructed one that is capable of giving an approximation to sin(x). I relied on intuition that I developed while reading a blog post that a classmate linked me to. I strongly recommend the post if you’re interested in ANNs:

My neural net is available for you to view and modify via GitHub:

Reconstruction and Simulation of Neocortical Microcircuitry

Authors: Markram, Muller, Ramaswamy, Reimann, et alia
Published: 2015

The scientists of the Blue Brain Project discuss their work to recreate the circuitry of the somatosensory cortex of a rodent brain in silico (on a computer). Basically, the authors digitally reconstructed a column of neocortical tissue, incorporating multitudinous neuron types and connections, all of which were modeled to behave as they would¬†in vivo. The validity of the digital reconstruction was shown by testing it against experimental observations/data not used to inform the reconstruction. Generally, the simulated neural activity mirrored empirical neural activity. The authors note that this suggests that the reconstruction may be useful as a platform for “in silico neuroscience”. Indeed, the paper details the simulation’s ability to make predictions about in vivo neural activity, and the simulation’s usefulness in probing the microscale cellular/synaptic mechanisms that underlie macroscale neural activity.

Building a Brain
The target of the simulation was the rodent somatosensory cortex, the main sensory receptive area for touch. The approach was to reconstruct a single column (microcircuit) of this cortex, generate slight variations of this microcircuit, and join together multiple microcircuits to make a mesocircuit (from which a digital slice of neocortical tissue could be taken for experimentation/simulation).

Experimental findings and mathematical models to describe neural behavior were accounted for in the reconstruction of the microcircuit:

  1. Patch-clamp recordings revealed 55 unique neuronal morphological classifications, and neuronal firing patterns suggested 11 unique neuronal electrical classifications. Accounting for each classification type, there were 207 unique morpho-electrical types of neurons modeled.
  2. Neurons were connected via placement of synapses at appositions of neuronal arbors.
  3. Behavior of each neuron type was modeled through the neuron-type-appropriate assignment of “Hodgkin-Huxley-type models of 13 known classes of¬†ion channels along the neuronal arbors.”
  4. Cell staining experiments revealed the depth of, the excitatory/inhibitory balance of neurons at, and the frequency of different m-types at each of the six layers of the neocortex. These experiments dictated the assignment of neurons into the digital microcircuit, which is a 2,000-micron-tall structure with approximately 31,000 neurons.
  5. The microcircuit’s activity was simulated with a variant of the NEURON simulator.

Discussion of Results
In the simulation, the level of calcium in the extracellular space (modeled as the probability of neurotransmitter release) determined where along the synchronous-asynchronous [SA] spectrum the neuron firing was. Higher levels of calcium led to slow oscillatory bursting, as opposed to asynchronous irregular activity. New in vivo experiments showed the same pattern.

Near the transition between the synchronous and asynchronous states, a number of in vivo phenomena emerge in the simulation. These phenomena include neuron triplets (trios of neurons with precise temporal relationships in their spiking), and soloist and chorister neuron firing patterns (firing in a manner that is uncorrelated with the group and correlated with the group, respectively).

In general, the simulation was capable of mimicking the results from experiments that were not used in the construction of the microcircuit. For example, in vivo studies show that the response characteristics of neurons depend on the type of neuron and the layer in which it exists. The authors simulated a rodent whisker deflection in silico, and the neuronal responses varied depending on cell type and layer in a manner consistent with experimental findings.

Perhaps most interestingly, the in silico reconstruction possesses the ability to test hypotheses. Since at least the early 2000s, there has been disagreement on the cause of uncorrelated neuronal spiking. One hypothesis is that correlated excitatory activity is cancelled out by anti-correlated inhibitory activity. In the simulation of the microcircuit, this exact process (correlated excitatory conductances cancelled out by anti-correlated inhibitory conductances) was observed. This finding demonstrates that the simulation can provide insight into the causes of higher-order activity.

Looking forward, the reconstruction/simulation of the microcircuitry can be refined in many ways. For example, gap junctions, receptors, glia, etc. can be added to the model. In addition, only one component of the brain was reconstructed, and the Blue Brain Project aims to simulate an entire brain.

Notable References/Resources

The portal linked to above offers researchers the ability to perform analyses and test hypotheses using the microcircuitry detailed in the paper.

The Toxic AB Oligomer and Alzheimer‚Äôs Disease: An Emperor in Need of Clothes

Authors: Iryna Benilova, Eric Karran, & Bart De  Strooper
Published: 2012

This review paper excellently summarizes the¬†experimental literature on Amyloid Beta’s (AB)¬†role in precipitating the neurodegeneration that we associate with Alzheimer’s disease. Understanding AB’s role is complicated by the many AB species¬†that exist, AB’s ability to¬†aggregate/organize itself in different ways, the number of suggested¬†mechanisms for its effects¬†on the brain, and the generation of data from both¬†in vivo and in vitro experiments. The title of the article refers to how some¬†experimental literature¬†pins the damaging effects of AB on a specific “toxic AB oligomer” (or specific set of oligomeric toxic AB species). The review’s authors explicitly, and implicitly through their discussion of the many potential mechanisms via¬†which AB could play a role in the pathology of Alzheimer’s disease, implore their cohort to avoid accepting too quickly the¬†hard-to-falsify, view-limiting hypothesis of the “toxic AB oligomer”.

Florida State University’s Computational Neuroscience group’s research relates to many of the findings presented in this paper. Some highlights were:

  • “Further study of the synaptotoxic effects of AB… has shown disturbances in the functioning of postsynaptic NMDA receptors, affecting calcium influx, a number of downstream cascades and postsynaptic AMPA receptors.” – page 354
  • Figure 4, which summarizes¬†multiple avenues for AB-initiated neurodegenerative cascade, and thus illustrates that there is no consensus.¬†– page 353
  • “Interestingly, almost all AB oligomers show¬†synaptotoxicity as measured by changes in dendritic spine morphology, altered long-term potentiation (LTP) and long-term depression (LTD) in hippocampal slices; effects on neurotransmission in cell culture; or defects in memory and cognition in¬†rodents.” – page 351
  • “… neurons in the vicinity of [amyloid] plaques display dystrophic neurites and synaptic loss, and elevated resting calcium levels are seen in neurons surrounding plaques in mouse models of Alzheimer’s disease.” – page 350