Machine Learning Project: Neural Network Learns To Play Tic-Tac-Toe

b_dmarius

Marius Borcan

Posted on March 1, 2020

Machine Learning Project: Neural Network Learns To Play Tic-Tac-Toe

This article was originally published at https://programmerbackpack.com.

A Machine Learning Project about building a Neural Network in Python with Keras and teaching it to play a game of Tic-Tac-Toe.

Motivation

In this article we are going to build a basic Neural Network that tries to learn the simple game of Tic-Tac-Toe. It is an already known fact that this is a solved game and using a Neural Network is a bit overkill, but with it being a simple game with an extremely small search space, it is a nice opportunity for us to play with a Neural Network without worrying too much about data gathering and cleanup.

All the code for this project is available on Github – I'll only post snippets here – so you can check it out there(if you do so, please don't forget to star the repository so I know you've enjoyed this project). Also, feel free to follow me on Twitter at @b_dmarius, I'll post there every new article.

Approach

We are going to build a simple tic-tac-toe game at the command line. I am not going to play the game myself, but I will let two computer players play it a few thousand times by choosing randomly from a list of available moves. We will record every move the players take and the eventual outcome of the game.

After the simulation, we are going to take the recorded data and feed it to a simple neural network and train our model. After training the model, we are going to let it play against the computer players, first as Player 1 and then as Player 2 and see the results.
The Game

For the Tic-Tac-Toe I've chosen a very basic implementation in Python. This part is not the most interesting in this project, so I'll explain the flow and list here the code. You can take it and tweak it as you wish for your project.

We are using a simple Game class to hold the information and process the flow of the game. A cell is marked with 0 if hasn't been chosen by any player, -1 for the Player with X and 1 for the Player with O. Each time, the game offers a list with all the available moves and players take turns choosing randomly from this list. The move is processed and the current state of the board is saved in the history we are going to use for the training. After the game ends(either it's a draw or one player wins), the game result is mapped to every state that the board had during the game. Here's the notations I've used:

PLAYER_X = 'X'
PLAYER_O = 'O'
EMPTY = ' '
PLAYER_X_VAL = -1
PLAYER_O_VAL = 1
EMPTY_VAL = 0
HORIZONTAL_SEPARATOR = ' | '
VERTICAL_SEPARATOR = '---------------'
GAME_STATE_X = -1
GAME_STATE_O = 1
GAME_STATE_DRAW = 0
GAME_STATE_NOT_ENDED = 2

Here's how the training history looks after the first move. This is the game where Player X won(since the -1 in the first item) and the first move in the game was made by Player X - line 3, column 2.

[(-1, [[0, 0, 0], [0, 0, 0], [0, -1, 0]])]

After a whole game the history would look like this:

[(-1, [[0, 0, 0], [0, 0, 0], [0, 1, 0]]), (-1, [[0, 0, 0], [0, -1, 0], [0, 1, 0]]), (-1, [[0, 0, 0], [1, -1, 0], [0, 1, 0]]), (-1, [[0, 0, -1], [1, -1, 0], [0, 1, 0]]), (-1, [[1, 0, -1], [1, -1, 0], [0, 1, 0]]), (-1, [[1, 0, -1], [1, -1, 0], [-1, 1, 0]]), (-1, [[1, 0, -1], [1, -1, 1], [-1, 1, 0]]), (-1, [[1, 0, -1], [1, -1, 1], [-1, 1, -1]]), (-1, [[1, 1, -1], [1, -1, 1], [-1, 1, -1]])]

Imaging how the training data looks after 10000 games! Luckly for us, we have numpy to help us. You'll see later how we are going to process all this data. Here's how we are simulating the game:

def simulate(self, playerToMove):
        while (self.getGameResult() == GAME_STATE_NOT_ENDED):
            availableMoves = self.getAvailableMoves()
            selectedMove = availableMoves[random.randrange(0, len(availableMoves))]
            self.move(selectedMove, playerToMove)
            if playerToMove == PLAYER_X_VAL:
                playerToMove = PLAYER_O_VAL
            else:
                playerToMove = PLAYER_X_VAL
        # Get the history and build the training set
        for historyItem in self.boardHistory:
            self.trainingHistory.append((self.getGameResult(), copy.deepcopy(historyItem)))

    def simulateManyGames(self, playerToMove, numberOfGames):
        playerXWins = 0
        playerOWins = 0
        draws = 0
        for i in range(numberOfGames):
            self.resetBoard()
            self.simulate(playerToMove)
            if i == 0:
                print(self.trainingHistory)
            if self.getGameResult() == PLAYER_X_VAL:
                playerXWins = playerXWins + 1
            elif self.getGameResult() == PLAYER_O_VAL:
                playerOWins = playerOWins + 1
            else: draws = draws + 1
        totalWins = playerXWins + playerOWins + draws
        print ('X Wins: ' + str(int(playerXWins * 100/totalWins)) + '%')
        print('O Wins: ' + str(int(playerOWins * 100 / totalWins)) + '%')
        print('Draws: ' + str(int(draws * 100 / totalWins)) + '%')

For evaluating the result of the game, we use a very basic method. Of course this could be optimized quite a lot, but this is not in scope for this article. Here's the code I used:

def getGameResult(self):
        for i in range(len(self.board)):
            for j in range(len(self.board[i])):
                if self.board[i][j] == EMPTY_VAL:
                    return GAME_STATE_NOT_ENDED

        # Rows
        for i in range(len(self.board)):
            candidate = self.board[i][0]
            for j in range(len(self.board[i])):
                if candidate != self.board[i][j]:
                    candidate = 0
            if candidate != 0:
                return candidate

        # Columns
        for i in range(len(self.board)):
            candidate = self.board[0][i]
            for j in range(len(self.board[i])):
                if candidate != self.board[j][i]:
                    candidate = 0
            if candidate != 0:
                return candidate

        # First diagonal
        candidate = self.board[0][0]
        for i in range(len(self.board)):
            if candidate != self.board[i][i]:
                candidate = 0
        if candidate != 0:
            return candidate

        # Second diagonal
        candidate = self.board[0][2]
        for i in range(len(self.board)):
            if candidate != self.board[i][len(self.board[i]) - i - 1]:
                candidate = 0
        if candidate != 0:
            return candidate

        return GAME_STATE_DRAW

Here are the results of the simulation after 10000 games. It seems that being the second player it's easier in a game of Tic-Tac-Toe and we'll see that also reflected in our Neural Network Simulation.

X Wins: 23%
O Wins: 64%
Draws: 11%

The Neural Network

We are going to build a neural network which will take as input a game board state(the value of each cell in the board: -1, 0, 1) and output the probabilities associated with each possible game result(X wins, O wins or it's a draw). From the output we choose the result with the highest probability as our game result. We will hope that the Neural Network will learn to predict a given board state to the correct eventual result.

So our network will have 9 inputs and 3 outputs. Between the input and the output layers we have 4 Dense layers. I've chosen the number of layers and their dimensions after experimenting with many different combinations. Feel free to try other combinations, I'm sure you'll get even better results! Here's the code to generate the model. You'll need tensorflow and keras installed for this.

def __init__(self, numberOfInputs, numberOfOutputs, epochs, batchSize):
        self.epochs = epochs
        self.batchSize = batchSize
        self.numberOfInputs = numberOfInputs
        self.numberOfOutputs = numberOfOutputs
        self.model = Sequential()
        self.model.add(Dense(64, activation='relu', input_shape=(numberOfInputs, )))
        self.model.add(Dense(128, activation='relu'))
        self.model.add(Dense(128, activation='relu'))
        self.model.add(Dense(128, activation='relu'))
        self.model.add(Dense(numberOfOutputs, activation='softmax'))
        self.model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])

Training the model

We'll take the board states and the eventual results and feed them as inputs and outputs to the neural network. We'll split the dataset in 80% training set and 20% evaluation set. The math behind a neural network is well beyond the scope of this article, but let me know if you want me to write an article about this too, I'll gladly do it!

We'll then use the data to train the neural network for 100 epochs in batches of 32. These numbers are also results of experimentation so you can always try new combinations.

 def train(self, dataset):
        input = []
        output = []
        for data in dataset:
            input.append(data[1])
            output.append(data[0])

        X = np.array(input).reshape((-1, self.numberOfInputs))
        y = to_categorical(output, num_classes=3)
        # Train and test data split
        boundary = int(0.8 * len(X))
        X_train = X[:boundary]
        X_test = X[boundary:]
        y_train = y[:boundary]
        y_test = y[boundary:]
        self.model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=self.epochs, batch_size=self.batchSize)

Predicting the best move

Predicting the best move at a given moment in the game is quite simple. The game already offers us the list with all the possible moves and so we can take every possible move, project it on the board and feed the board state through the neural network and note the result that we are interested in(for example if the Neural Network is Player X, we note the probability that Player X wins).

After all the possible moves have been processed by the network, we choose the one with the highest score and keep it as our next move.

 def predict(self, data, index):
        return self.model.predict(np.array(data).reshape(-1, self.numberOfInputs))[0][index]

And here's the code for simulating the games with the Neural Network as one player and then the other.

def simulateNeuralNetwork(self, nnPlayer, model):
        playerToMove = PLAYER_X_VAL
        while (self.getGameResult() == GAME_STATE_NOT_ENDED):
            availableMoves = self.getAvailableMoves()
            if playerToMove == nnPlayer:
                maxValue = 0
                bestMove = availableMoves[0]
                for availableMove in availableMoves:
                    # get a copy of a board
                    boardCopy = copy.deepcopy(self.board)
                    boardCopy[availableMove[0]][availableMove[1]] = nnPlayer
                    if nnPlayer == PLAYER_X_VAL:
                        value = model.predict(boardCopy, 0)
                    else:
                        value = model.predict(boardCopy, 2)
                    if value > maxValue:
                        maxValue = value
                        bestMove = availableMove
                selectedMove = bestMove
            else:
                selectedMove = availableMoves[random.randrange(0, len(availableMoves))]
            self.move(selectedMove, playerToMove)
            if playerToMove == PLAYER_X_VAL:
                playerToMove = PLAYER_O_VAL
            else:
                playerToMove = PLAYER_X_VAL

    def simulateManyNeuralNetworkGames(self, nnPlayer, numberOfGames, model):
        nnPlayerWins = 0
        randomPlayerWins = 0
        draws = 0
        print ("NN player")
        print (nnPlayer)
        for i in range(numberOfGames):
            self.resetBoard()
            self.simulateNeuralNetwork(nnPlayer, model)
            if self.getGameResult() == nnPlayer:
                nnPlayerWins = nnPlayerWins + 1
            elif self.getGameResult() == GAME_STATE_DRAW:
                draws = draws + 1
            else: randomPlayerWins = randomPlayerWins + 1
        totalWins = nnPlayerWins + randomPlayerWins + draws
        print ('X Wins: ' + str(int(nnPlayerWins * 100/totalWins)) + '%')
        print('O Wins: ' + str(int(randomPlayerWins * 100 / totalWins)) + '%')
        print('Draws: ' + str(int(draws * 100 / totalWins)) + '%')

And here are the results of the simulation for our Neural Network games(10000 games each).

Simulating with Neural Network as X Player:
NN player
-1
X Wins: 57%
O Wins: 1%
Draws: 40%

Simulating with Neural Network as O Player:
NN player
1
X Wins: 3%
O Wins: 89%
Draws: 6%

And the results look very good to me!! In both games the Neural Network won a lot more games than the other player. For the first case, even if the Neural Network won only 57% of the games, the other player won only 1% and the rest were draws, which still is a very good result. For the second games the result was even better, as the NN Player won an astounding 89% of the games, with the other player winning only 3%. So in both cases the opponent stood no chance!

Conclusions

In this article we tried a simple approach for building a Neural Network that tries to play Tic-Tac-Toe. We only used a basic model and still got some decent results. If you'd like to obtain even better results, I recommend trying a more complex model and more training data. Also, another good approach will be to use Reinforcement Learning, but that is for another article. The entire code for the project can be found on Github. Feel free to star the project if you enjoyed working with it. Thanks for reading!

Interested in more? Follow me on Twitter at @b_dmarius and I'll post there every new article.

💖 💪 🙅 🚩
b_dmarius
Marius Borcan

Posted on March 1, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related