Skip to main content

Elman RNN implementation in Tensorflow

In this article, an implementation and explanation is given for an Elman RNN.

So… What is an Elman RNN?

Suppose you have many documents and you would like to find all names in the documents. Then one option is to use an Elman RNN! This RNN is originally invented by Jeffrey Elman [1]. The Elman RNN reads word (and context) by word (and context) and tries to predict the label (either O if the word is not a name, B-NAME if the word is the beginning of a name and I-NAME if the word is the continuation of a name). One example of such a labeling is the following:

Hi, I am Data Blogger. How are you?

The labelling (or annotation) would then be as follows:


Hi [O], I [O] am [O] Data [B-NAME] Blogger [I-NAME]. How [O] are[O] you [O]?

In fact, the punctuation marks (.,!?) could also be labelled, but tokenizing words is not the topic of this article, so let’s focus on the network architecture.

The Architecture

The Elman Recurrent Neural Network is a neural network with a variable number of recursions. In this article, we will implement one including a Word Embedding layer which converts words to a semantic representation. Then, these embeddings are fed to the recursion layer which takes the previous hidden layer ($h_{t-1}$) and current token ($x_t$) as input and produces probabilities for each of the classes ($p(O)$, $p(B-NAME)$, $p(I-NAME)$) as output. The internal hidden layer and the next token are then used as input for the recursion.

A few words on the implementation


I found it extremely useful to use Tensorboard! With Tensorboard, you can visualize the variables and by that I discovered that I obtained NaN values fairly quickly. This was due to an exploding gradient. I limited the learning rate and things went well. The loss decreased and in the end I obtained satisfying results.

Data loaders

I always create classes that load the data. In that way, I have a data abstraction and that is useful because almost every project has different data. I can then reuse my models and just plugin another data loader for different projects. Furthermore, make sure that your data formats are fast to read! I now use cPickle for storing and loading my data and this saved me a lot of time since for every run I need to load many datafiles. If you have a different approach, please let me know in the comment section below!

The implementation

So, now I told you about the model, I will give you the implementation in Python and using Tensorflow. If you have any questions, please ask me and send a comment in the comment section below! I will first show you the graph obtained from the Tensorboard:

A Graph.

I grouped relevant layers into scopes. What you can clearly see is that the input layer sends its data to the recurrence layer. This is correct, since the inputs are converted into embeddings and the embeddings are used for the recurrence layer. Then, the outputs of the recurrence layers are used to calculate the output. For backpropagation, the gradients are computed and are fed back into the network to update the weights and biases.

This is the resulting code for my Elman RNN:

import tensorflow as tf
import numpy as np

class ElmanRNN:
    def __init__(self, num_vocab, num_embedding, num_context, num_hidden, num_classes):
        self.num_vocab = num_vocab
        self.num_embedding = num_embedding
        self.num_context = num_context
        self.num_hidden = num_hidden
        self.num_classes = num_classes

        self.params = {}

        with tf.name_scope('input'):
            with tf.name_scope('input_data'):
       = tf.placeholder(tf.int32, name='data')

            with tf.name_scope('embedding'):
                self.idxs = tf.reshape(, (-1,)) + 1
                self.emb = tf.Variable(name='emb', initial_value=ElmanRNN.initialize(num_vocab + 1, num_embedding))
                self.params['emb'] = self.emb
                self.input_emb = tf.gather(self.emb, self.idxs)
                self.input_emb_fix = tf.where(tf.is_nan(self.input_emb), tf.zeros_like(self.input_emb), self.input_emb)

            with tf.name_scope('x'):
                num_inputs = tf.shape([0]
                self.wx = tf.Variable(name='wx', initial_value=ElmanRNN.initialize(num_embedding * num_context, num_hidden))
                self.params['wx'] = self.wx
                self.x = tf.reshape(self.input_emb_fix, (num_inputs, num_embedding * num_context), name='x')

        with tf.name_scope('recurrence'):
            with tf.name_scope('recurrence_init'):
                self.h = tf.Variable(name='h', initial_value=np.zeros((1, num_hidden)))
                h_0 = tf.reshape(self.h, (1, num_hidden))
                self.params['h'] = self.h
                s_0 = tf.constant(np.matrix([0.] * num_classes))
                y_0 = tf.constant(0, 'int64')
            with tf.name_scope('hidden'):
                self.wh = tf.Variable(name='wh', initial_value=ElmanRNN.initialize(num_hidden, num_hidden))
       = tf.Variable(name='bh', initial_value=np.zeros((1, num_hidden)))
                self.params['wh'] = self.wh
                self.params['bh'] =
            with tf.name_scope('classes'):
                self.w = tf.Variable(name='w', initial_value=ElmanRNN.initialize(num_hidden, num_classes))
                self.b = tf.Variable(name='b', initial_value=np.zeros((1, num_classes)))
                self.params['w'] = self.w
                self.params['b'] = self.b
            self.h, self.s, self.y = tf.scan(self.recurrence, self.x, initializer=(h_0, s_0, y_0))

        with tf.name_scope('output'):
            with tf.name_scope('target_data'):
       = tf.placeholder(tf.float64, name='target')
            with tf.name_scope('probabilities'):
                self.s = tf.squeeze(self.s)
            with tf.name_scope('outcomes'):
                self.y = tf.squeeze(self.y)
            with tf.name_scope('loss'):
                self.loss = -tf.reduce_sum( * tf.log(tf.clip_by_value(self.s, 1e-20, 1.0)))

    def initialize(*shape):
        return 0.001 * np.random.uniform(-1., 1., shape)

    def recurrence(self, old_state, x_t):
        h_t, s_t, y_t = old_state
        x = tf.reshape(x_t, (1, self.num_embedding * self.num_context))
        input_layer = tf.matmul(x, self.wx)
        hidden_layer = tf.matmul(h_t, self.wh) +
        h_t_next = input_layer + hidden_layer
        s_t_next = tf.nn.softmax(tf.matmul(h_t_next, self.w) + self.b)
        y_t_next = tf.squeeze(tf.argmax(s_t_next, 1))
        return h_t_next, s_t_next, y_t_next


[1] Elman, Jeffrey L. “Distributed representations, simple recurrent networks, and grammatical structure.” Machine learning 7.2-3 (1991): 195-225.

Kevin Jacobs

Kevin Jacobs

Kevin Jacobs is a certified Data Scientist and blog writer for Data Blogger. He is passionate about any project that involves large amounts of data and statistical data analysis. Kevin can be reached using Twitter (@kmjjacobs), LinkedIn or via e-mail: