# Natural scenes and sparse coding in visual cortex
**Visual scenes are more than just a bunch of colored pixels: they are composed of numerous elementary components in complex arrangements, thus creating image patterns with a rich structure. How can our brain make sense of such a complex visual input and perform a meaningful computation such as recognizing objects or segregating a scene into foreground and background?**
**One central hypothesis in Computational Neuroscience is that the visual system decomposes visual scenes into a set of elementary components or features such as oriented edges which could form the boundary of an object -- but how does the brain perform such a computation? Moreover, how does the brain 'know' about the elementary components a visual scene is composed of? To investigate these questions, you will simulate a simple neural network that decomposes a visual scene into its elementary features. If time allows, you can also train your network to find the most appropriate features for describing visual scenes.**
In this mini-project, you will focus on the neural network model described in [2]. This model implements a biologically inspired version of what is known as the \emph{sparse coding} hypothesis ([1]), which conjectures an optimal sensory coding approach where a neural population uses \emph{as few active units as possible} to represent a stimulus.
Sparse coding assumes that an image $\vec{x}$ with $N$ pixels can be written as a linear superposition of $M$ fundamental features $\vec{\varphi}_j$
and can be solved in several ways. Interestingly, the result one gets is that the features will look like localized oriented filters of different spatial frequencies as shown above.
This naturally suggest to interpret the quantities $\vec{\varphi}_{j}$ as (classical) receptive fields of V1 cells and the coefficients $a_j$ as the firing rates of these neurons.
Now, assume that you already know a complete feature set
$${\vec{\varphi}\_{j}}\_{j=1,\ldots,M}$$
(this is often called 'dictionary' in the literature). What you will implement is a dynamical system that gives as output the coefficients needed to represent a visual input. It is described by two simple equations:
Sketch the network with circles representing the neurons and arrows representing the feedforward and recurrent input contributions. Annotate your sketch with the mathematical symbols introduced above.
Prove that the dynamics proposed by the model (equations (4) and (5))
indeed solves the optimization problem defined by expression (2). For this, you have to use the definition for the energy $E$ (3) and show that $$\dot{a}_m \propto -\frac{\partial E}{\partial a_m}$$
Neurons in the primary visual cortex (V1) are selective for the size, orientation and frequency of a stimulus falling within a restricted portion of the visual space known as the receptive field. However, many electrophysiological studies show that stimuli placed *outside* the classical receptive field (CRF) can influence the response of V1 cells, causing the emergence of non linear behaviours. These contextual modulations and non linear response properties are commonly referred to as non-classical receptive field (nCRF) effects. Surprisingly, the model you are investigating can reproduce many of these effects -- check out how your model behaves with respect to the nCRF experiments mentioned in the paper. For this purpose, choose one (or more) nCRF effect(s) described in [2] and reproduce them!
So far you didn't have to worry about how to obtain a dictionary~--~you worked with a variable that was learned from natural images, stored and given to you.
## 1.
Think of an algorithm to learn from real data the coefficients together with a dictionary.
**Hint:** The model you implemented already does half of the work for you.
which is the product of a sinusoidal wave and a gaussian function.
Here $(x_0, y_0)$ is the center of the gaussian, $\sigma_x, \sigma_y$ are the standard deviation along the axis,
% $x^{\prime}$ and $y^{\prime}$ are the rotated axis
$$x^{\prime} = x \cos(\theta) + y\sin(\theta)$$
$$y^{\prime} = -x \sin(\theta) + y\cos(\theta)$$
$\theta$ is the orientation of the filter, $f$ its spatial frequency and $\psi$ its phase.
Write a function that, given all the above parameters, outputs a gabor filter.
Set a number $N_{\theta}$ of orientations, a number $N_{f}$ of spatial frequencies and a number $N_{\psi}$ of phase offsets and create a dictionary
and create a collection of $N=N_{\theta} N_{f} N_{\psi}$ features with different orientations, spatial frequencies and phases.
Repeat all the tasks with this new dictionary.
## Literature:
[1] Olshausen, Bruno A., and Field David J. (1996). **Emergence of simple-cell receptive field properties by learning a sparse code for natural images.***Nature* 381.6583: 607.
[2] Zhu, Mengchen, and Rozell Christopher J. (2013). **Visual nonclassical receptive field effects emerge from sparse coding in a dynamical system.***PLoS Comput Biol* 9.8: e1003191.