Currently, more focus is given to reasoning in generative text models. In this blog post, we will research the limits of reasoning in current generative text models and possible solutions.

GPT-3 fails at some logic puzzles. Let's example a puzzle which does not have an answer and should not be solvable.

```
You are a detective trying to solve a puzzle. You have three suspects: John, Paul, and George. You know that one of them is the perpetrator, but you don't know which one. You have the following information:
- John is taller than Paul
- George is shorter than Paul
- The perpetrator is left-handed
What is the solution to this puzzle and why?
```

The answer by GPT-3 was the following:

`If we assume that the perpetrator is John, then we can see that this does not fit with the information given. Since John is taller than Paul, and George is shorter than Paul, this means that the perpetrator must be George.`

Now let's look at a puzzle involving the Pigeonhole Principle. That is a puzzle in which you can find the answer by excluding answers. The following puzzle can be solved by the Pigeonhole Principle:

```
Alice, Bob, and Cecile are wearing a blue, green and a red shirt.
- If Cecile wears a green shirt, then Alice wears a blue shirt.
- Cecile wears a green shirt.
Who is wearing which shirt?
```

Can you find the answer to this puzzle? The correct answer is Cecile=green, Alice=blue, Bob=red. However, GPT-3 comes with this incorrect answer:

```
GPT-3: Alice is wearing a blue shirt, Bob is wearing a green shirt, and
Cecile is wearing a red shirt.
```

As you can see, GPT-3 is not good at solving all types of reasoning puzzles. How can we improve the reasoning capabilities of GPT-3?

Chain-of-Thought prompting [1] is a technique in which you explicitly ask the model to come up with an explanation to the answer. This is done by adding "Let's think step-by-step." to your prompt. Here you can see some examples of Chain-of-Thought prompting:

With Chain-of-Thought prompting, we can get better answers, especially for numerical reasoning. But it will still not be good enough for the capabilities of the AI for more complex reasoning challenges.

Some research has been done on complex reasoning and AI [2]. In the Complex Reasoning [2] paper, we can see several error types when an AI is tested on LSAT (Law School Admission Test):

Simple reasoning tasks which involve only one or two steps are relatively easy for generative models. However, once it gets more complex, the model makes more and more errors.

GPT-3 is not perfect and struggles with reasoning tasks. In this blog post, you have seen several examples of reasoning tasks and how GPT-3 performs on them. As researchers continue to explore advancements in artificial intelligence, addressing these limitations in reasoning will be crucial for the development of more robust and versatile AI systems.

- Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., & Zhou, D. (2022). Chain of Thought Prompting Elicits Reasoning in Large Language Models (arXiv:2201.11903). arXiv.
__http://arxiv.org/abs/2201.11903__ - Wang, Siyuan, Zhongkun Liu, Wanjun Zhong, Ming Zhou, Zhongyu Wei, Zhumin Chen, and Nan Duan. 2021. ‘From LSAT: The Progress and Challenges of Complex Reasoning’.

LLMs are becoming very popular. The possibilities are endless with these LLMs. One danger is that all your private data is used by the companies hosting these LLMs. Therefore, it would be good to know how you can run a local LLM.

One popular local framework for running LLMs is Ollama. With Ollama, you can select an Open Source LLM, download it and then run it on your local machine. And the best thing is that only a little tweak is needed from moving from a LLM cloud provider to Ollama as their API mimics one of the most widely used APIs.

Let's start by downloading and running the popular Llama2 model. This can be done by executing the following command in your terminal (after installing Ollama):

`ollama run llama2`

Ollama will also become available as a webservice at http://localhost:11434. But suppose you would like to run an application developed on top of Python with the `openai`

package on Ollama, how can we do that?

We can use LiteLLM to serve as a proxy for the `openai`

package. First, we would need to install LiteLLM with `pip install litellm`

. Then we can use the `litellm`

package just like we would use the `openai`

package:

```
from litellm import completion
response = completion(
model="ollama/llama2",
messages=[{"content": "Hi there!","role": "user"}],
api_base="http://localhost:11434"
)
print(response)
```

It is just like how you would use the OpenAI package, but now for free and on your local machine.

Large LLM cloud providers get more and more information as more people use their services. With Local LLMs, we can stay in charge of our own data and achieve the same results at a much lower prise.

]]>The amount of information that we consume is growing every day. As a consequence, we need mechanisms to compress this growing amount of information. Text summarisation is a tool for compressing written text and has been used for ages. At this moment, the amount of information is growing exponentially as of which it might be helpful to design models that can automatically summarise texts for us.

Automatic text summarisation comes in two flavours: extractive summarisation and abstractive summarisation. Extractive summarisation models take exact phrases from the reference documents and use them as a summary. One of the very first research papers on (extractive) text summarisation is the work of Luhn [1]. TextRank [2] (based on the concepts used by the PageRank algorithm) is another widely used extractive summarisation model.

In the era of deep learning, abstractive summarisation became a reality. With abstractive summarisation, a model generates a text instead of using literal phrases of the reference documents. One of the more recent works on abstractive summarisation is PEGASUS [3] (a demo is available at HuggingFace). PEGASUS can summarise the following Wikipedia article:

```
Python is an interpreted high-level general-purpose programming language.
Its design philosophy emphasizes code readability with its use of significant indentation.
Its language constructs as well as its object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects.
Python is dynamically-typed and garbage-collected.
It supports multiple programming paradigms, including structured (particularly, procedural), object-oriented and functional programming.
It is often described as a "batteries included" language due to its comprehensive standard library.
Guido van Rossum began working on Python in the late 1980s, as a successor to the ABC programming language, and first released it in 1991 as Python 0.9.0.
Python 2.0 was released in 2000 and introduced new features, such as list comprehensions and a garbage collection system using reference counting.
Python 3.0 was released in 2008 and was a major revision of the language that is not completely backward-compatible.
Python 2 was discontinued with version 2.7.18 in 2020.
Python consistently ranks as one of the most popular programming languages.
```

As output, it then generates the following (abstractive) summary of this text:

`Python is a programming language developed by Guido van Rossum.`

What I find interesting, is that this exact phrase cannot be found in the reference document and that one model is capable of compressing textual information automatically. However, there are some challenges with abstractive text summarisation as well which are explored in the next section.

In this section, several challenges for automatic text summarisation will be discussed as well as potential research directions.

What are the challenges that we are facing with these kinds of models? Are these models perfect? No, and that directly brings me to the first point: how can we measure the "quality" of a summary? In the past, several metrics are developed. Such as ROUGE and BLEU (which roughly measure the amount of overlap between the generated summary and the reference text). But what "fluency" (the grammatical and semantical correctness of a text)? And factual correctness? One issue with abstractive models is the generated output might contain words and numbers that are not found in the reference texts. Restricting the vocabulary might be one possible solution for constraining the output [5] which is explained below. Hopefully, more metrics and methods for controlling the outputs will become available.

Another challenge is multi-document summarisation, in which multiple documents are summarised into a single summary. This task can be complicated further by using documents of different languages as input. The inputs of this task can become large. Most of the abstractive models are based on Transformers [4] which are known to have a quadratic memory requirement with respect to the number of input tokens. In practice, often 512 subword tokens can be used with Transformer-based models which is a troublesome limitation for the multi-document summarisation task. Luckily, some models are capable of transforming the quadratic memory requirement to a linear memory requirement, such as the Longformer [6] which is explained below. A larger number of datasets on (multilingual) multi-document summarisation together with solutions on decreasing the memory requirement of Transformers might be helpful for multi-document summarisation.

Another interesting research direction might be controlling the inputs. What if we can concentrate on only certain aspects of the inputs? Or what if we can combine textual data with image data? Another idea might be to combine text summarisation with other NLP subtasks in order to gain more control over the process.

Constraining the vocabulary as Nucleus Sampling does can help in controlling the output. The authors of [5] mention that generated text often is bland, incoherent or stuck in a repetitive loop. The following image shows these undesirable properties:

To cope with these issues, the authors propose to use Nucleus Sampling: a dynamically sized subset of the vocabulary while predicting the next word, depending on the likelihood of the next word. The authors call this top-p sampling. It is closely related to top-k sampling, in which the top-k vocabulary is used.

An interesting assumption as explained in the paper is that human-written text does not equal the most probable text. They support this assumption by the fact that people optimize against stating the obvious [7] - which is exactly the opposite of optimizing for the most likely text.

The following image illustrates how Nucleus Sampling avoids repetitive loops and incoherent texts in a single example in which the algorithms generate text starting from a given sentence:

Thus, techniques like top-k sampling and Nucleus Sampling (top-p sampling) help in generating more coherent and less repetitive texts which are two undesirable properties in text generation and thus also in automatic text summarisation.

One issue for multi-document summarisation is the quadratic memory requirement of Transformer-based models. One solution is the Longformer [6]. As the authors mention, Transformer-based models use attention on all input tokens:

The idea of the Longformer is to use different attention strategies in order to cope with longer inputs. By incorporating special tokens through the text, one can enable the computation of full attention on these special tokens. Therefore, the memory requirement gets reduced to the number of these special tokens times the number of input tokens. To compare: Transformers work on 512 subword tokens, but the Longformer is evaluated on a dataset where some documents contained 14.5K tokens! Besides the Longformer, there are other solutions as well focussing on reducing the quadratic memory requirement, such as Big Bird [8]. Once we can overcome the quadratic memory cost without sacrificing the quality, multi-document summarisation and other tasks related to longer and/or multiple documents can be solved.

The amount of (textual) information is growing exponentially as well as the need for automatic text summarisation tools. Automatic text summarisation is an exciting subfield of natural language processing. Both extractive and abstractive text summarisation methods might bring us solutions for keeping up with the growing amount of information. One challenge for automatic text summarisation is measuring the quality of generated texts and another challenge is the input length constraint in Transformer-based models.

- Luhn, H. P. (1958). The automatic creation of literature abstracts.
*IBM Journal of research and development*,*2*(2), 159-165. - Mihalcea, R., & Tarau, P. (2004, July). Textrank: Bringing order into text. In
*Proceedings of the 2004 conference on empirical methods in natural language processing*(pp. 404-411). - Zhang, J., Zhao, Y., Saleh, M., & Liu, P. (2020, November). Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. In
*International Conference on Machine Learning*(pp. 11328-11339). PMLR. - Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. In
*Advances in neural information processing systems*(pp. 5998-6008). - Holtzman, A., Buys, J., Du, L., Forbes, M., & Choi, Y. (2019). The curious case of neural text degeneration.
*arXiv preprint arXiv:1904.09751*. - Beltagy, I., Peters, M. E., & Cohan, A. (2020). Longformer: The long-document transformer.
*arXiv preprint arXiv:2004.05150*. - H Paul Grice. Logic and conversation. In P Cole and J L Morgan (eds.),
*Speech Acts*, volume 3 of*Syntax and Semantics*, pp. 41–58. Academic Press, 1975. - Zaheer, M., Guruganesh, G., Dubey, K. A., Ainslie, J., Alberti, C., Ontanon, S., ... & Ahmed, A. (2020, July). Big Bird: Transformers for Longer Sequences. In
*NeurIPS*.

In this Python Matplotlib tutorial series, you will learn how to create and improve a plot in Python using pyplot. Matplotlib is a 2D plotting library written for Python. It consists of pyplot (in the code often shortened by “plt”), which is an object oriented interface to the plotting library. Matplotlib is an initiative of John Hunter, Darren Dale, Eric Firing, Michael Droettboom and the Matplotlib development team. Lets stop talking and start creating some beautiful plots using Matplotlib!

In this post, we will gradually build a data visualization of two simple functions: sine and cosine. First, the main concepts are explained and then the step-by-step tutorial is explained.

The figure can be seen as the canvas, on which all drawing components are plotted. The figure consists of axes, which are subdivisions of the figure. Each of the axes consists of one or more axis (horizontal (x-axis), vertical (y-axis) or even depth (z-axis)). All of this is visualized in the following picture:

We will now create two very, very simple plots. One containing

$$y_1=\sin(x_1)$$

(sine) and the other one containing

$$y_2=\sin(x_2)$$

(cosine). For the calculations, we use NumPy (shortened by “np”). Matplotlib is often used in combination with Numpy. The code should be self-explaining (if not, please mention it in a comment).

```
import numpy as np
import matplotlib.pyplot as plt
# Create the figure and two axes (two rows, one column)
fig, (ax1, ax2) = plt.subplots(2, 1)
# Create a plot of y = sin(x) on the first row
x1 = np.linspace(0, 4 * np.pi, 100)
y1 = np.sin(x1)
ax1.plot(x1, y1)
# Create a plot of y = cos(x) on the second row
x2 = np.linspace(0, 4 * np.pi, 100)
y2 = np.cos(x2)
ax2.plot(x2, y2)
# Save the figure
plt.savefig('sin_cos.png')
```

This results into the following:

It is also possible to combine the current axes into one plot by using the following code:

```
import numpy as np
import matplotlib.pyplot as plt
# Create the figure and two axes (two rows, one column)
fig, ax1 = plt.subplots(1, 1)
# Share the x-axis for both the axes (ax1, ax2)
ax2 = ax1.twinx()
# Create a plot of y = sin(x) on the first row
x1 = np.linspace(0, 4 * np.pi, 100)
y1 = np.sin(x1)
ax1.plot(x1, y1)
# Create a plot of y = cos(x) on the second row
x2 = np.linspace(0, 4 * np.pi, 100)
y2 = np.cos(x2)
ax2.plot(x2, y2)
# Save the figure
plt.savefig('sin_cos_2.png')
```

This results into the following data visualization of both functions:

However, notice that this plot is quite simple. It is not even clear which line belongs to what function. In the next few sections, we will add gradual improvements to the plot.

The first improvement is to add a legend. With the legend, it becomes clear what line belongs to what function (either sine or cosine).

```
import numpy as np
import matplotlib.pyplot as plt
# Create the figure and two axes (two rows, one column)
fig, ax1 = plt.subplots(1, 1)
# Share the x-axis for both the axes (ax1, ax2)
ax2 = ax1.twinx()
# Create a plot of y = sin(x) on the first row
x1 = np.linspace(0, 4 * np.pi, 100)
y1 = np.sin(x1)
# Add a label for the legend
function1 = ax1.plot(x1, y1, label='Sine')
# Create a plot of y = cos(x) on the second row
x2 = np.linspace(0, 4 * np.pi, 100)
y2 = np.cos(x2)
# Add a label for the legend
function2 = ax2.plot(x2, y2, label='Cosine')
# Create the legend by first fetching the labels from the functions
functions = function1 + function2
labels = [f.get_label() for f in functions]
plt.legend(functions, labels, loc=0)
# Save the figure
plt.savefig('sin_cos_3.png')
```

This results into the following:

Notice the loc=0 parameter. This sets the location of the legend. loc=0 automatically selects the best place for the legend. There is one clear downside of our current figure now. All the functions have the same color. The next improvement adds color to the plot.

Adding color is one of the easiest steps! This results into the following code changes:

```
import numpy as np
import matplotlib.pyplot as plt
# Create the figure and two axes (two rows, one column)
fig, ax1 = plt.subplots(1, 1)
# Share the x-axis for both the axes (ax1, ax2)
ax2 = ax1.twinx()
# Create a plot of y = sin(x) on the first row
x1 = np.linspace(0, 4 * np.pi, 100)
y1 = np.sin(x1)
# Add a label for the legend and make it blue
function1 = ax1.plot(x1, y1, 'b', label='Sine')
# Create a plot of y = cos(x) on the second row
x2 = np.linspace(0, 4 * np.pi, 100)
y2 = np.cos(x2)
# Add a label for the legend and make it red
function2 = ax2.plot(x2, y2, 'r', label='Cosine')
# Create the legend by first fetching the labels from the functions
functions = function1 + function2
labels = [f.get_label() for f in functions]
plt.legend(functions, labels, loc=0)
# Save the figure
plt.savefig('sin_cos_4.png')
```

We now get the following plot:

Now we will add the title to our plot and add the axis labels by using the following code changes:

```
import numpy as np
import matplotlib.pyplot as plt
# Create the figure and two axes (two rows, one column)
fig, ax1 = plt.subplots(1, 1)
# Share the x-axis for both the axes (ax1, ax2)
ax2 = ax1.twinx()
# Create a plot of y = sin(x) on the first row
x1 = np.linspace(0, 4 * np.pi, 100)
y1 = np.sin(x1)
# Add a label for the legend and make it blue
function1 = ax1.plot(x1, y1, 'b', label='Sine')
# Create a plot of y = cos(x) on the second row
x2 = np.linspace(0, 4 * np.pi, 100)
y2 = np.cos(x2)
# Add a label for the legend and make it red
function2 = ax2.plot(x2, y2, 'r', label='Cosine')
# Create the legend by first fetching the labels from the functions
functions = function1 + function2
labels = [f.get_label() for f in functions]
plt.legend(functions, labels, loc=0)
# Add x-label (only one, since it is shared) and the y-labels
ax1.set_xlabel('$x$')
ax1.set_ylabel('$y_1$')
ax2.set_ylabel('$y_2$')
# Add the title
plt.title('Sine and Cosine')
# Adjust the figure such that all rendering components fit inside the figure
plt.tight_layout()
# Save the figure
plt.savefig('sin_cos_5.png')
```

This results into the following plot:

This already is a great plot. In the last step, we will use a different rendering engine (build on top of Matplotlib) so all our code is reused but our plot gets improved!

We will now use the seaborn library to improve the style of the plot. For this, we will use the following code:

```
import numpy as np
import matplotlib.pyplot as plt
# Import another rendering engine
import seaborn as sns
# Create the figure and two axes (two rows, one column)
fig, ax1 = plt.subplots(1, 1)
# Share the x-axis for both the axes (ax1, ax2)
ax2 = ax1.twinx()
# Create a plot of y = sin(x) on the first row
x1 = np.linspace(0, 4 * np.pi, 100)
y1 = np.sin(x1)
# Add a label for the legend and make it blue
function1 = ax1.plot(x1, y1, 'b', label='Sine')
# Create a plot of y = cos(x) on the second row
x2 = np.linspace(0, 4 * np.pi, 100)
y2 = np.cos(x2)
# Add a label for the legend and make it red
function2 = ax2.plot(x2, y2, 'r', label='Cosine')
# Create the legend by first fetching the labels from the functions
functions = function1 + function2
labels = [f.get_label() for f in functions]
plt.legend(functions, labels, loc=0)
# Add x-label (only one, since it is shared) and the y-labels
ax1.set_xlabel('$x$')
ax1.set_ylabel('$y_1$')
ax2.set_ylabel('$y_2$')
# Add the title
plt.title('Sine and Cosine')
# Adjust the figure such that all rendering components fit inside the figure
plt.tight_layout()
# Save the figure
plt.savefig('sin_cos_6.png')
```

This results into our final plot:

In this section, I will highlight some videos I found excellent for learning the concepts of pyplot.

The following video series introduce the basic concepts:

The next few videos are on advanced topics:

It is quite easy to create plots using Matplotlib. This tutorial shows the steps to build a plot gradually. If you have any questions, please let me know in the comments.

]]>