Implementing Elman RNN in Tensorflow: a Helpful Tutorial

Implementing Elman RNN in Tensorflow: a Helpful Tutorial

In this Python Deep Learning tutorial, an implementation and explanation is given for an Elman RNN. The implementation is done in Tensorflow, which is one of the many Python Deep Learning libraries.

A more modern RNN is the GRU. A GRU has less parameters to train and is therefore quite fast. An implementation in Tensorflow of the GRU can be found here.

What is an Elman RNN?

Suppose you have many documents and you would like to find all names in the documents. Then one option is to use an Elman RNN! This RNN is originally invented by Jeffrey Elman [1]. The Elman RNN reads word (and context) by word (and context) and tries to predict the label (either O if the word is not a name, B-NAME if the word is the beginning of a name and I-NAME if the word is the continuation of a name). One example of such a labeling is the following:

Barack Obama is a former president of the USA.

The labelling (or annotation) would then be as follows:

  • Barack [B-NAME]
  • Obama [I-NAME]
  • is [O]
  • a [O]
  • former[O]
  • president [O]
  • of [O]
  • the [O]
  • USA [O]
  • . [O]

In the next section, we will look at the architecture of the network.

The Architecture

The Elman Recurrent Neural Network is a neural network with a variable number of recursions. In this article, we will implement one including a Word Embedding layer which converts words to a semantic representation. Then, these embeddings are fed to the recursion layer which takes the previous hidden layer (


) and current token (


) as input and produces probabilities for each of the classes (






) as output. The internal hidden layer and the next token are then used as input for the recursion.

A few words on the implementation


I found it extremely useful to use Tensorboard! With Tensorboard, you can visualize the variables and by that I discovered that I obtained NaN values fairly quickly. This was due to an exploding gradient. I limited the learning rate and things went well. The loss decreased and in the end I obtained satisfying results.

Data loaders

I always create classes that load the data. In that way, I have a data abstraction and that is useful because almost every project has different data. I can then reuse my models and just plugin another data loader for different projects. Furthermore, make sure that your data formats are fast to read! I now use cPickle for storing and loading my data and this saved me a lot of time since for every run I need to load many datafiles. If you have a different approach, please let me know in the comment section below!

The implementation

So, now I told you about the model, I will give you the implementation in Python and using Tensorflow. If you have any questions, please ask me and send a comment in the comment section below! I will first show you the graph obtained from the Tensorboard:

A Graph.

I grouped relevant layers into scopes. What you can clearly see is that the input layer sends its data to the recurrence layer. This is correct, since the inputs are converted into embeddings and the embeddings are used for the recurrence layer. Then, the outputs of the recurrence layers are used to calculate the output. For backpropagation, the gradients are computed and are fed back into the network to update the weights and biases.

This is the resulting code for my Elman RNN:

import tensorflow as tf
import numpy as np

class ElmanRNN:
    def __init__(self, num_vocab, num_embedding, num_context, num_hidden, num_classes):
        self.num_vocab = num_vocab
        self.num_embedding = num_embedding
        self.num_context = num_context
        self.num_hidden = num_hidden
        self.num_classes = num_classes

        self.params = {}

        with tf.name_scope('input'):
            with tf.name_scope('input_data'):
       = tf.placeholder(tf.int32, name='data')

            with tf.name_scope('embedding'):
                self.idxs = tf.reshape(, (-1,)) + 1
                self.emb = tf.Variable(name='emb', initial_value=ElmanRNN.initialize(num_vocab + 1, num_embedding))
                self.params['emb'] = self.emb
                self.input_emb = tf.gather(self.emb, self.idxs)
                self.input_emb_fix = tf.where(tf.is_nan(self.input_emb), tf.zeros_like(self.input_emb), self.input_emb)

            with tf.name_scope('x'):
                num_inputs = tf.shape([0]
                self.wx = tf.Variable(name='wx', initial_value=ElmanRNN.initialize(num_embedding * num_context, num_hidden))
                self.params['wx'] = self.wx
                self.x = tf.reshape(self.input_emb_fix, (num_inputs, num_embedding * num_context), name='x')

        with tf.name_scope('recurrence'):
            with tf.name_scope('recurrence_init'):
                self.h = tf.Variable(name='h', initial_value=np.zeros((1, num_hidden)))
                h_0 = tf.reshape(self.h, (1, num_hidden))
                self.params['h'] = self.h
                s_0 = tf.constant(np.matrix([0.] * num_classes))
                y_0 = tf.constant(0, 'int64')
            with tf.name_scope('hidden'):
                self.wh = tf.Variable(name='wh', initial_value=ElmanRNN.initialize(num_hidden, num_hidden))
       = tf.Variable(name='bh', initial_value=np.zeros((1, num_hidden)))
                self.params['wh'] = self.wh
                self.params['bh'] =
            with tf.name_scope('classes'):
                self.w = tf.Variable(name='w', initial_value=ElmanRNN.initialize(num_hidden, num_classes))
                self.b = tf.Variable(name='b', initial_value=np.zeros((1, num_classes)))
                self.params['w'] = self.w
                self.params['b'] = self.b
            self.h, self.s, self.y = tf.scan(self.recurrence, self.x, initializer=(h_0, s_0, y_0))

        with tf.name_scope('output'):
            with tf.name_scope('target_data'):
       = tf.placeholder(tf.float64, name='target')
            with tf.name_scope('probabilities'):
                self.s = tf.squeeze(self.s)
            with tf.name_scope('outcomes'):
                self.y = tf.squeeze(self.y)
            with tf.name_scope('loss'):
                self.loss = -tf.reduce_sum( * tf.log(tf.clip_by_value(self.s, 1e-20, 1.0)))

    def initialize(*shape):
        return 0.001 * np.random.uniform(-1., 1., shape)

    def recurrence(self, old_state, x_t):
        h_t, s_t, y_t = old_state
        x = tf.reshape(x_t, (1, self.num_embedding * self.num_context))
        input_layer = tf.matmul(x, self.wx)
        hidden_layer = tf.matmul(h_t, self.wh) +
        h_t_next = input_layer + hidden_layer
        s_t_next = tf.nn.softmax(tf.matmul(h_t_next, self.w) + self.b)
        y_t_next = tf.squeeze(tf.argmax(s_t_next, 1))
        return h_t_next, s_t_next, y_t_next

Conclusion (TL;DR)

This Python deep learning tutorial showed you how to implement an Elman RNN in Tensorflow. If you have any unanswered questions, feel free to ask them in the comments!


[1] Elman, Jeffrey L. “Distributed representations, simple recurrent networks, and grammatical structure.” Machine learning 7.2-3 (1991): 195-225.