Nowadays, the quality of the generated texts is becoming better and better. In this blog post, you will learn how you can use a pre-trained language model for generating Tweets. This tutorial is not only applicable to Tweets, but to any desired output task. In fact, the title of this blog post is generated by a model based on GPT-2!
Neural language models
In this section, I will highlight a few papers that are important to understand current language models. Language models are not new. Actually, there is a paper on the automatic creation of literature abstract from 1958 . Since then, a lot happened in the field of Computational Linguistics. In this post, we will focus on a Sequence-to-Sequence encoder-decoder architecture with attention based on Transformers . Text is sequential data since the order of words are important. Sequential data is often modeled (in the neural network community) by Long-Short Term Memory cells . However, these models are slow by their architecture: the computation of the next state depends on the computation of all previous states. The Transformer is – simply stated – a parallel variant of the LSTM, including a smart attention mechanism. GPT-2 is a network consisting of 1558M parameters trained on a large section of the internet. You should check out the TalkToTransformer website to get impressed by this model! And recently, GPT-3 was released.
So, how can we teach our to-be-created network to generate Tweets from articles? For this purpose, I downloaded data from my own Twitter profiles. A different option is to scrape website and try to predict their titles, since these are short summaries of the underlying document. In total, I only got 3000 Tweets and the articles which are mentioned in these Tweets. I prepared the data such that the dataset has the following structure:
<|doc|>The first 300 characters of the first article...<|endofdoc|><|tweet|>The corresponding Tweet #article1 #great [LINK]<|endoftweet|> <|doc|>The first 300 characters of the second article...<|endofdoc|><|tweet|>The corresponding Tweet #article2 #great [LINK]<|endoftweet|> ...
I also made sure that there are no duplicates in the dataset and I preprocessed the links such that all links are replaced by a [LINK] placeholder. I stored the dataset in the file which I called “training.txt”. The next step is the algorithm.
Training the algorithm
I use the Transformers library of HuggingFace 🤗. In here, the pretrained GPT-2 is implemented in Python, which is great. I downloaded the pretrained GPT-2 model and I pointed the Language Model train script at the created dataset:
python run_lm_finetuning.py --model_name_or_path output --do_train --save_steps 50 --model_type gpt2 --no_cuda --train_data_file training.txt --output_dir output --overwrite_output_dir --num_train_epochs 1000
And then, I waited for several hours! The process of retraining a pretrained model is called finetuning.
After the model is trained, it is possible to make predictions using the model! I do so by specifying a sample text and by mentioning the start token of the Tweet:
<|doc|>In this blog post, you will learn how to generate Tweets from articles using GPT2, which is a transformer-based model.<|endofdoc|><|tweet|>
It then generates Tweets! The following Tweet (which can be found on my Twitter profile) is actually generated by the model!
How to Generate Tweets from Articles using GPT2 with Machine Learning #data #gpt2 #data #python [LINK]
In this article, we finetuned a pretrained language model and one can get impressive results for simple text-based tasks. I am eager to learn more about Natural Language Understanding and on Information Extraction. If you are also interested in these topics as well, then please stay tuned! And don’t forget to share if you liked this article.
Bonus: getting philosophical
Now we can try to feed our model existential questions like the following:
- Who am I?
The model gives the following answer:
I am an expert in Machine Learning, Artificial Intelligence, and Machine Learning #machinelearning #machine #ai
Well, that is an interesting answer. Let’s take it a step further:
- What is the meaning of life?
Exploring the meaning of life #opens #deeplearning [LINK]
The model at least knows how to promote an article about the meaning of life, which is great! If you have any questions for the model, please let me know.
- Luhn, H. P. (1958). The automatic creation of literature abstracts. IBM Journal of research and development, 2(2), 159-165.
- Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 5998-6008).
- Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.