Teaching Computers to Play Tic Tac Toe: A Reinforcement Learning Approach

by Naomi A.

You can see the full code on GitHub:
https://github.com/SQLNoggin/Smart-TicTacToe-RL_NaomiA.git

This Python script implements a reinforcement learning agent to play the game of Tic Tac Toe. The agent is trained through Q-learning, a type of model-free reinforcement learning that uses a Q-table to estimate the value of taking specific actions in particular states. Let's break down the key aspects:

Overview

The script uses the Tkinter library to create a graphical user interface (GUI) where the Tic Tac Toe game can be played. NumPy is used for numerical operations, and the pickle library is used to save and load the Q-table to/from a file.

Constant and Hyperparameter Definitions

PLAYER_X and PLAYER_O: Constants representing the two players (1 for X and -1 for O).
EMPTY: Constant representing an empty cell on the Tic Tac Toe board.
ALPHA: The learning rate used in the Q-learning updates.
GAMMA: The discount factor used in the Q-learning updates.
EPSILON_START, EPSILON_END, EPSILON_DECAY: Parameters controlling the epsilon-greedy policy, determining the exploration-exploitation trade-off during training.

Function Definitions

board_to_int: Converts the current board state to a unique integer, facilitating the indexing of the Q-table.
choose_action: Implements the epsilon-greedy policy to choose an action either randomly or based on the current Q-table values.
check_end: Checks if the current board state is the end of the game, and identifies the winner or a draw.
save_q_table: Saves the current Q-table and epsilon value to a file using pickle.
load_q_table: Loads the Q-table and epsilon value from a file, or creates a new Q-table if no file is found.

Q-learning and Tic Tac Toe GUI

The script defines a TicTacToeGUI class that handles both the GUI and the game logic:

init: Initializes the GUI with a grid of buttons representing the Tic Tac Toe board and buttons to trigger the training or play against the agent.
update_gui: Updates the GUI to reflect the current board state.
reset_board: Resets the board to the initial state.
make_move: Handles a move made by the human player or the agent and updates the Q-table using the Q-learning update rule.
game_over: Displays a message when the game ends and resets the board.
train_agent: Trains the agent using Q-learning, playing a number of games against itself and updating the Q-table after each move. Utilizes symmetries of the board to enhance learning. Epsilon decays over episodes to reduce exploration over time.
play_against_agent: Resets the board to start a new game against the trained agent.

Main Block

Creates an instance of the Tkinter root widget, initializes the TicTacToeGUI class, and starts the Tkinter event loop.

Discussion

This script combines reinforcement learning and Tkinter GUI elements to create an interactive Tic Tac Toe game where a human can play against a trained agent. The reinforcement learning agent, based on Q-learning, leverages the following:

State-Action Representation: The state of the board is represented as an integer using board_to_int, and actions are represented as the index of the cell where a move is made.
Learning and Exploration: The agent learns by updating the Q-values during self-play training. An epsilon-greedy strategy controls the exploration vs exploitation trade-off.
Symmetry Utilization: The script takes advantage of the symmetries in the Tic Tac Toe board to learn more efficiently by augmenting the Q-learning updates with symmetric states and actions.

Conclusion

This script illustrates how to implement a Q-learning agent for Tic Tac Toe with a Tkinter GUI for human interaction. It showcases reinforcement learning concepts like Q-table initialization, epsilon-greedy strategy, and Q-value updates in a practical application.

Thanks for reading! Stay tuned techies...

Search

SqlNoggin