Welcome to the first article in the “How to build a poker A.I” series. This series will delve into various aspects involved in building a poker A.I. The first few articles will explore the theory and algorithms behind solving games like poker; namely variants of CFR (Counter Factual Regret minimization) and how we can bring the impractically large game of poker down into an abstracted solvable size, we will then compare the algorithms we have built to see which converge quickest. But before doing all that, we first need to establish some basics! This article is written to give you the basics in game theory to help you understand future articles in this series.
Below I’ve defined some key words and used an example of the simplest game I know, Rock-Paper-Scissors, to help explain these concepts.
Strategy: Strategy is the probability of a player taking each action, for example, in Rock Paper Scissors if I randomly pick an option with equal probability my strategy is [1/3, 1/3, 1/3] for each action respectively.
Utility: Is the average value. In rock paper scissors a win can be assigned a value of 1, a loss can be assigned a value of -1 and a tie can be assigned a value of 0. If on average you win, lose and draw one third of the time, you’re utility is 0.
Regret: Regret for an action is the difference between the utility of that action and the utility of the current strategy. For example in a game of rock paper scissors if my opponents strategy is [1,0,0] (that is they choose rock every time and never chose paper or scissors), and my strategy is [1/3, 1/3, 1/3], Then my utility is 0 (a third of the time I win, a third of the time I lose, a third of the time I tie). However if my strategy was always to choose paper I would win 100% of the time and thus my utility would be 1. This is 1 more than my current strategy, thus this produces a regret of 1 - 0 = 1. The regret for each action would be [0, 1, -1].
Nash Equilibrium: When we talk about solving a game, we’re talking about finding the Nash equilibrium. In a Nash equilibrium each players strategy cannot gain any more utility from using a different strategy - they have reached an equilibrium. In Rock-Paper-Scissors the Nash equilibrium is simply a strategy of [1/3, 1/3, 1/3], changing from this strategy will not be able to yield either player any more utility. Playing this strategy in Rock-Paper-Scissors means our opponents and our own utility will always be zero, so why bother with playing a Nash equilibrium strategy if we don’t win? well Rock-Paper-Scissors is a one shot game, whereas poker is a sequential game. In a sequential game any time our opponent deviates from the Nash equilibrium while we are playing a Nash equilibrium strategy they will lose utility and we will gain utility.
Best Response strategy: This is a strategy that maximises our utility against a know opponent strategy, if our opponent always chooses rock our best response is to always choose paper. This results in an average value of 1 for us. In other words we have exploited our opponent by 1. This metric of exploitability is a important measurement while trying to find a Nash equilibrium strategy; while solving we will try and get our exploitability to as close to zero as possible.
The game tree is a representation of all possible game states within the game, where each game state is linked to its connecting game states. For example in poker the first branches of the tree are the cards dealt to each player, followed by player actions. Chance events can be considered to be actions taken by a “chance player” who’s strategy is the probability of each card (chance action).
An Information set is a set of game states that the active player cannot distinguish. For example, in poker a player may be facing a bet, but he does not know the opponent cards thus he could be at any number of nodes within the game tree, these are the game states of the information Set. As the active player cannot distinguish between the game states they must have a single strategy for all the game states within that information set.
A perfect information game is one where you know exactly what has taken place previously and thus you know exactly where you are in the game tree. This means only one traversal of the game tree is needed to solve the game. Chess is a perfect information game, In order to figure out your optimal move you can make a single traversal of the game tree passing back up utility of each node assuming your opponent is taking their best response action. The strategy is then simply the action with the highest utility.
Poker does not have that luxury, poker is a game of incomplete information, this is because you do not know what the opponents cards are and therefor do not know where you are in the game tree. This provides significant problems when attempting to solve the game that have to be overcome with some interesting algorithms that we will start exploring in the next article.
Regret matching is where a players strategy is updated to be proportional to their positive cumulative regrets. This allows regrets to be minimised and forms the core of CFR algorithms. Positive regrets indicate the level of relative losses one has experienced for not having selected the action in the past. For example if our opponents strategy for Rock-Paper-Scissors is [1, 0, 0] and our strategy is [0, 0, 1], our regret for each action respectively is [1, 2, 0]. With Regret Matching our strategy on the following iteration thus becomes proportional to the positive regret, resulting in a strategy of [1/3, 2/3, 0].
Below is a cheat sheet for some of the symbols that will be used throughout this series, this will allow you to follow along with the mathematics should you wish to do so.