Artificial Neural Nets, a gentle introduction

Artificial Neural Nets, a gentle introduction
Photo by Denny Müller / Unsplash

This article explains the basics of some Artificial Neural Network models an overview of the recent history Artificial Neural Networks, from MLPs to GANs. Feel free to join the discussion!

Neural Nets?

Neural Networks are invented back in the mid 1980s. “Artifical” as in Artificial Neural Networks refers to the fact that these models are not the real mechanism of the brain, but just a man-made model. These type of models try to resemble some of the mathematical properties of brains since these are capable of doing a lot of things, from learning, speaking, computing to doing some very specific tasks. In the 1990s, neural network models where not very popular. This was mainly due to the lack of computing power. In 2010, neural nets became hyped again and a lot of research is done in the field of neural networks. One of the current drawbacks, is that neural networks really need a lot of training data. But besides that, they perform extremely well. I will now give an overview of some of the neural network models invented after the MLP. The MLP was invented in the mid 1980s. It stands for Multilayer Perceptron and has one or more layers of “hidden” nodes. These nodes are not observed by the users of the neural network. The users give input to the network and the network gives them a prediction from the output nodes in return. There are some shortcomings of this model and an overview of extensions of the network are given in the following section.

More models please!

So, it all starts with an MLP. Over the past years, many new models are proposed. There are many shortcomings of the good old MLP. For example, how would you make a model that needs to takes memory into consideration?

1997 – Memory-based models (RNN)

A group of models called Recurrent Neural Networks is partially solving this problem. These are chained “ordinary” neural networks, in which old states are propagated from one Neural Network to the next Neural Network. The problem here, is that the number of timesteps is fixed. You should specify how many timesteps you would like to model. One issue that arises, is the so called vanishing gradient problem. Information from a state at a specific timestep vanishes the further it propagates to the chain of neural networks. This is solved by a model called LSTM (Long Term Short Term Memory). This model is explained in more detail in [3].

1997 – Convolutional Neural Networks (CNN)

Convolutional Neural Networks (CNNs) are and extension upon Neural Networks and try to capture some regularity of the input data. In image data, there is a lot of shared information if you look at neighboring pixels for a given pixel. For example, if some pixel is green, it is very likely that the neighboring pixels are also green. CNNs work extremely efficient on image data. The paper is found here: [2].

2010 – Deep learning (DNN)

A trending buzzword is “Deep learning”. Personally, I hate this word. It is not adding anything new to the world of Neural Networks. Big companies like Google and Facebook have enough computing power and data to optimize these types of models, but for normal people it is not feasible to train such networks. Deep learning refers to the fact that many (> 100) layers are trained. In order to do so, you need a large amount of data. Some people refer to this as “infinite resource”. If you have only 5 training examples, it is already to train a single layer. Let alone to train a few hundreds of layers! So please, do me a favor and ban this buzzword. It is only a show-off for the biggest companies.

2014 – Generative Adversarial Networks (GAN)

Generative Adversarial Networks were invented back in 2014 [1]. Until then, most of the models are discriminating models. That is, they can discriminate several classes. For example, these models can learn to separate apples and pears in images. GANs are in fact two copies of the same neural network, but one of the networks is mirrored. The output of the first model are used as the input of the second model. This is illustrated in the next image:


In the image, the left neural network has 3 inputs and 2 outputs. And the mirrored neural network has 2 inputs and 3 outputs. The goal of the network is to reconstruct the input data from the left-hand network in the right-hand network. The network on the left is for example a network which takes an image as input and predicts whether the image has a face or not. Now the network on the right-hand side has as input whether an image has a face or not and reconstructs the image! Isn’t that cool?


I think that Deep Learning has potential, but also major drawbacks. You need a lot of data in order to train these models succesfully. Therefore, it is better to look at zero-shot models (which assume a minimalistic amount of information). In that way, these models can be scaled up easily to Deep Neural Networks and would probably perform better than the state-of-the-art approaches. What do you think? What are major drawbacks of Deep Learning? What can be improved further?


[1] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., … & Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems (pp. 2672-2680).

[2] Lawrence, S., Giles, C. L., Tsoi, A. C., & Back, A. D. (1997). Face recognition: A convolutional neural-network approach. IEEE transactions on neural networks, 8(1), 98-113.

[3] Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.