You want to build your own Tic-Tac-Toe opponent? Then you need to read further! In the Tic-Tac-AI series, I will present a couple of Artificial Intelligence algorithms implemented as Tic-Tac-Toe opponent. In this first article, I will introduce a method called Forward Sampling which is capable of not losing any game of Tic-Tac-Toe!
The repository for this article is found on GitHub.
Tic-Tac-Toe is a very basic game and therefore extremely helpful to display the most elegant Artificial Intelligence algorithms. In this article, I will explain Forward Sampling implemented in a Tic-Tac-Toe player for you.
Here, we will model the winning probability for our smart player by . Let denote the current state of the game (that is, the tiles we have picked and the tiles the opponent has picked). So, what is our probability to win? In order to simulate this, we pick a valid random action (a tile that is not occupied yet) and we evaluate our winning probability given the action we just picked (). Now, our opponent can pick any tile. The opponent also picks a random valid next action and denote this by . Now, it is our turn again. So, eventually, we end up calculating .
At any point in the game, consists of some consecutive actions, say . What is the next move we need to make in order to win the game? That is, what valid action should we take such that is maximal? Let be the remainder of the game. That is, the set of actions which were picked to finish the game (). Notice that there are many possibilities for given and . With our Forward Sampling procedure, we pick many different options for and check for every valid . For example, suppose that we are in the following state (and we are O and our opponent is X):
If we do not pick the bottom-right cell, we will lose! Thus, if does not equal the bottom-right cell, there are only a few options for in which we will not lose. Our procedure will therefore pick the bottom-right cell, since this has the highest winning probability.
In this section, I will give the code for the Forward Sampling player:
The tiles (my_tiles, opponent_tiles, free_tiles) are lists of integers where 0 is the left-top cell and 8 is the bottom-right cell. The make_action method, returns an action (any free tile) based on the described sampling procedure. Every time, 100 samples are generated (that is, 100 different options for are computed) and then the best tile is picked.
Forward sampling is very effective (and does not require any learning). Can you beat the algorithm? If you have any questions or suggestions, feel free to comment below this post!