# Implementing Elman RNN in Tensorflow: a Helpful Tutorial

#### Written by Kevin Jacobs

I'm Kevin, a Data Scientist, PhD student in NLP and Law and blog writer for Data Blogger.

In this Python Deep Learning tutorial, an implementation and explanation is given for an Elman RNN. The implementation is done in Tensorflow, which is one of the many Python Deep Learning libraries.

A more modern RNN is the GRU. A GRU has less parameters to train and is therefore quite fast. An implementation in Tensorflow of the GRU can be found here.

## What is an Elman RNN?

Suppose you have many documents and you would like to find all names in the documents. Then one option is to use an Elman RNN! This RNN is originally invented by Jeffrey Elman [1]. The Elman RNN reads word (and context) by word (and context) and tries to predict the label (either O if the word is not a name, B-NAME if the word is the beginning of a name and I-NAME if the word is the continuation of a name). One example of such a labeling is the following:

Barack Obama is a former president of the USA.

The labelling (or annotation) would then be as follows:

• Barack [B-NAME]
• Obama [I-NAME]
• is [O]
• a [O]
• former[O]
• president [O]
• of [O]
• the [O]
• USA [O]
• . [O]

In the next section, we will look at the architecture of the network.

## The Architecture

The Elman Recurrent Neural Network is a neural network with a variable number of recursions. In this article, we will implement one including a Word Embedding layer which converts words to a semantic representation. Then, these embeddings are fed to the recursion layer which takes the previous hidden layer ($h_{t-1}$) and current token ($x_t$) as input and produces probabilities for each of the classes ($p(O)$, $p(B-NAME)$, $p(I-NAME)$) as output. The internal hidden layer and the next token are then used as input for the recursion.

## A few words on the implementation

### Tensorboard

I found it extremely useful to use Tensorboard! With Tensorboard, you can visualize the variables and by that I discovered that I obtained NaN values fairly quickly. This was due to an exploding gradient. I limited the learning rate and things went well. The loss decreased and in the end I obtained satisfying results.

I always create classes that load the data. In that way, I have a data abstraction and that is useful because almost every project has different data. I can then reuse my models and just plugin another data loader for different projects. Furthermore, make sure that your data formats are fast to read! I now use cPickle for storing and loading my data and this saved me a lot of time since for every run I need to load many datafiles. If you have a different approach, please let me know in the comment section below!

## The implementation

So, now I told you about the model, I will give you the implementation in Python and using Tensorflow. If you have any questions, please ask me and send a comment in the comment section below! I will first show you the graph obtained from the Tensorboard:

I grouped relevant layers into scopes. What you can clearly see is that the input layer sends its data to the recurrence layer. This is correct, since the inputs are converted into embeddings and the embeddings are used for the recurrence layer. Then, the outputs of the recurrence layers are used to calculate the output. For backpropagation, the gradients are computed and are fed back into the network to update the weights and biases.

This is the resulting code for my Elman RNN:

``````import tensorflow as tf
import numpy as np

class ElmanRNN:
def __init__(self, num_vocab, num_embedding, num_context, num_hidden, num_classes):
self.num_vocab = num_vocab
self.num_embedding = num_embedding
self.num_context = num_context
self.num_hidden = num_hidden
self.num_classes = num_classes

self.params = {}

with tf.name_scope('input'):
with tf.name_scope('input_data'):
self.data = tf.placeholder(tf.int32, name='data')

with tf.name_scope('embedding'):
self.idxs = tf.reshape(self.data, (-1,)) + 1
self.emb = tf.Variable(name='emb', initial_value=ElmanRNN.initialize(num_vocab + 1, num_embedding))
self.params['emb'] = self.emb
self.input_emb = tf.gather(self.emb, self.idxs)
self.input_emb_fix = tf.where(tf.is_nan(self.input_emb), tf.zeros_like(self.input_emb), self.input_emb)

with tf.name_scope('x'):
num_inputs = tf.shape(self.data)[0]
self.wx = tf.Variable(name='wx', initial_value=ElmanRNN.initialize(num_embedding * num_context, num_hidden))
self.params['wx'] = self.wx
self.x = tf.reshape(self.input_emb_fix, (num_inputs, num_embedding * num_context), name='x')

with tf.name_scope('recurrence'):
with tf.name_scope('recurrence_init'):
self.h = tf.Variable(name='h', initial_value=np.zeros((1, num_hidden)))
h_0 = tf.reshape(self.h, (1, num_hidden))
self.params['h'] = self.h
s_0 = tf.constant(np.matrix([0.] * num_classes))
y_0 = tf.constant(0, 'int64')
with tf.name_scope('hidden'):
self.wh = tf.Variable(name='wh', initial_value=ElmanRNN.initialize(num_hidden, num_hidden))
self.bh = tf.Variable(name='bh', initial_value=np.zeros((1, num_hidden)))
self.params['wh'] = self.wh
self.params['bh'] = self.bh
with tf.name_scope('classes'):
self.w = tf.Variable(name='w', initial_value=ElmanRNN.initialize(num_hidden, num_classes))
self.b = tf.Variable(name='b', initial_value=np.zeros((1, num_classes)))
self.params['w'] = self.w
self.params['b'] = self.b
self.h, self.s, self.y = tf.scan(self.recurrence, self.x, initializer=(h_0, s_0, y_0))

with tf.name_scope('output'):
with tf.name_scope('target_data'):
self.target = tf.placeholder(tf.float64, name='target')
with tf.name_scope('probabilities'):
self.s = tf.squeeze(self.s)
with tf.name_scope('outcomes'):
self.y = tf.squeeze(self.y)
with tf.name_scope('loss'):
self.loss = -tf.reduce_sum(self.target * tf.log(tf.clip_by_value(self.s, 1e-20, 1.0)))

@staticmethod
def initialize(*shape):
return 0.001 * np.random.uniform(-1., 1., shape)

def recurrence(self, old_state, x_t):
h_t, s_t, y_t = old_state
x = tf.reshape(x_t, (1, self.num_embedding * self.num_context))
input_layer = tf.matmul(x, self.wx)
hidden_layer = tf.matmul(h_t, self.wh) + self.bh
h_t_next = input_layer + hidden_layer
s_t_next = tf.nn.softmax(tf.matmul(h_t_next, self.w) + self.b)
y_t_next = tf.squeeze(tf.argmax(s_t_next, 1))
return h_t_next, s_t_next, y_t_next``````

## Conclusion (TL;DR)

This Python deep learning tutorial showed you how to implement an Elman RNN in Tensorflow. If you have any unanswered questions, feel free to ask them in the comments!

## References

[1] Elman, Jeffrey L. “Distributed representations, simple recurrent networks, and grammatical structure.” Machine learning 7.2-3 (1991): 195-225.