Tic-Tac-AI: A Strong Tic-Tac-Toe AI Opponent using Forward Sampling

Tic-Tac-AI: A Strong Tic-Tac-Toe AI Opponent using Forward Sampling
Photo by Jon Tyson / Unsplash

You want to build your own Tic-Tac-Toe opponent? Then you need to read further! In the Tic-Tac-AI series, I will present a couple of Artificial Intelligence algorithms implemented as Tic-Tac-Toe opponent. In this first article, I will introduce a method called Forward Sampling which is capable of not losing any game of Tic-Tac-Toe!

The repository for this article is found on GitHub.


Tic-Tac-Toe is a very basic game and therefore extremely helpful to display the most elegant Artificial Intelligence algorithms. In this article, I will explain Forward Sampling implemented in a Tic-Tac-Toe player for you. Maybe you are interested in implementing Tic-Tac-Toe in the form of a web app? Then definitely read this article!

Forward Sampling!

Here, we will model the winning probability for our smart player by


. Let


denote the current state of the game (that is, the tiles we have picked and the tiles the opponent has picked).  So, what is our probability to win? In order to simulate this, we pick a valid random action


(a tile that is not occupied yet) and we evaluate our winning probability given the action we just picked (


). Now, our opponent can pick any tile. The opponent also picks a random valid next action and denote this by


. Now, it is our turn again. So, eventually, we end up calculating

p_{win}(a_1, a_2, …, a_n)


At any point in the game,


consists of some consecutive actions, say

S=a_1, …, a_s

. What is the next move we need to make in order to win the game? That is, what valid action


should we take such that

p_{win}(S, k)

is maximal? Let


be the remainder of the game. That is, the set of actions which were picked to finish the game (

T=b_1, …, b_t

). Notice that there are many possibilities for






. With our Forward Sampling procedure, we pick many different options for


and check for every valid

p_{win}(S, k, T)

. For example, suppose that we are in the following state (and we are O and our opponent is X):

Tic-Tac-Toe state.

If we do not pick the bottom-right cell, we will lose! Thus, if


does not equal the bottom-right cell, there are only a few options for


in which we will not lose. Our procedure will therefore pick the bottom-right cell, since this has the highest winning probability.


In this section, I will give the code for the Forward Sampling player:

class SamplingPlayer(Player):
    A Player based on forward sampling.
    GAME_TIE = 0
    GAME_WIN = 1
    GAME_LOSE = 2
    num_samples = 100

    A player based on forward sampling.
    def make_action(self, my_tiles, opponent_tiles, free_tiles):
        # Keep track of the number of wins
        num_wins = {}

        # Loop through all free tiles
        for free_tile in free_tiles:
            wins = 0

            # Now simulate a game with two random players from the current state on where you choose the free tile and
            # see who will win
            for _ in range(self.num_samples):
                my_tiles_copy = [tile for tile in my_tiles]
                opponent_tiles_tiles_copy = [tile for tile in opponent_tiles]
                sample = self.sample_game(True, free_tile, my_tiles_copy, opponent_tiles_tiles_copy)

                # A tie or a win is good enough
                if sample == self.GAME_TIE or sample == self.GAME_WIN:
                    wins += 1
            num_wins[free_tile] = wins

        # Check which move has the highest expected number of wins
        best_score, best_item = None, None
        for item, score in num_wins.items():
            if best_score is None or score > best_score:
                best_score = score
                best_item = item

        # Give back this action
        return best_item

    def sample_game(self, is_my_turn, action, my_tiles, opponent_tiles):
        Run a game with two random players.

        :param is_my_turn: Whether or not it is my turn (True or False).
        :param action: The next action to take.
        :param my_tiles: The tiles belonging to me (list of integers 0-8).
        :param opponent_tiles: The tiles belonging to the opponent (list of integers 0-8).
        :return: GAME_TIE when a tie happened, GAME_WIN when a win happened and GAME_LOSE when you loose the game.

        # Calculate which tiles are still left
        free_tiles = [tile for tile in range(9) if
                      tile not in my_tiles and tile not in opponent_tiles and tile != action]

        # Add the action to either my tiles or the opponent tiles depending on who's turn it is
        if is_my_turn:
            my_tiles += [action]
            opponent_tiles += [action]

        # When there are no tiles left, it is a tie
        if len(free_tiles) == 0:
            return self.GAME_TIE
            # Otherwise, I win when it is my turn and my tiles contain a winning combination
            # If they don't contain a winning combination, the game will continue
            # The same reasoning applies when it is not my turn
            if is_my_turn:
                if Game.check_win(my_tiles):
                    return self.GAME_WIN
                    return self.sample_game(False, free_tiles[0], my_tiles, opponent_tiles)
                if Game.check_win(opponent_tiles):
                    return self.GAME_LOSE
                    return self.sample_game(True, free_tiles[0], my_tiles, opponent_tiles)


The tiles (my_tiles, opponent_tiles, free_tiles) are lists of integers where 0 is the left-top cell and 8 is the bottom-right cell. The make_action method, returns an action (any free tile) based on the described sampling procedure. Every time, 100 samples are generated (that is, 100 different options for


are computed) and then the best tile is picked.


Forward sampling is very effective (and does not require any learning). Can you beat the algorithm? If you have any questions or suggestions, feel free to comment below this post!