Time to actually build our AI; we are going to build a CFR algorithm to find a Nash equilibrium strategy for heads up (2 players) No Limit Texas Hold ‘em poker, in post flop scenarios. In this article I walk through the code that is related to the algorithm, run a simple scenario solve and then look at the results.
Please note this code has been written for the purpose of demonstrating the theory and is therefor very inefficient. The code here is in C#, but the language isn’t important, its just a way to show the theory being implemented in a simple way.
public class InformationSet
{
public float[] RegretSum;
public float[] StrategySum;
public InformationSet(int NumberOfActions)
{
//Initialise float arrays to required length and set elements ofarray to zero
this.RegretSum = new float[NumberOfActions];
this.StrategySum = new float[NumberOfActions];
}
}
Information sets only needs to contain two arrays; the regret sum and the strategy sum. Each array element is for each strategy action option.
public class InformationSetCFRLogic
{
public float[] GetStrategy(InformationSet infoSet)
{
float[] PosRegrets = infoSet.CFRegretSum.Select(e => Math.Max(0,e)).ToArray();
return Helpers.Normalise(PosRegrets);
}
public void AddToStrategySum(InformationSet infoSet, float[]strategy, float activePlayerReachProbability)
{
infoSet.StrategySum = infoSet.StrategySum.Zip(strategy, (x, y)=> x + y * activePlayerReachProbability).ToArray();
}
public float[] GetFinalStrategy(InformationSet infoSet)
{
return Helpers.Normalise(infoSet.StrategySum);
}
public void AddToCumulativeRegrets
(InformationSet infoSet, GameStateNode gameStateNode, float[]actionUtilities, float nodeUtility)
{
if (gameStateNode.ActivePlayer == Player.Player1)
{
float counterFactualReachProbability = gameStateNod.ReachProbabiltyP2;
for (int i = 0; i < actionUtilities.Length; i++)
{
infoSet.CFRegretSum[i] = infoSet.CFRegretSum[i] +counterFactualReachProbability * (actionUtilities[i]- nodeUtility);
}
}
else if (gameStateNode.ActivePlayer == Player.Player2)
{
float counterFactualReachProbability = gameStateNod.ReachProbabiltyP1;
for (int i = 0; i < actionUtilities.Length; i++)
{
infoSet.CFRegretSum[i] = infoSet.CFRegretSum[i] +counterFactualReachProbability * (nodeUtility -actionUtilities[i]);
}
}
}
}
These methods are used to change and retrieve regret or strategy values from an information set. This logic here contains most of the the theory we established in the previous article so it may be helpful to flip back to link the two.
The AddToCumulativeRegrets method calculates the immediate counterfactual regret as being the utility of an action subtract the utility of the current strategy (node utility) and then multiplied by the counterfactual probability. It then adds this regret to the regret sum. The counterfactual probability for player 1 is the reach probability of player 2 multiplied by the reach probability of chance player, however the chance players probability contribution will be the same for all occurrences at the information set and the regret sum will eventually be normalised, therefor we do not need to include the chance probability. Note this implementation of the algorithm uses utilities from player 1 perspective, therefor when at a player 2 node we must negate the regrets.
The GetStrategy method will return the strategy to be used for the current iteration, as described in the previous article, this is done by normalising the positive counterfactual regrets (regret matching). Therefore first any negative regrets are replaced with zero, then the regrets are normalised and returned.
The AddToStrategySum method will add the strategy to the information sets strategy sum, first weighting the strategy by the active players reach probability.
The GetFinalStrategy method is used to return the average strategy which should converge towards Nash equilibrium. This is achieved by returning the normalised strategy sum.
public class GameStateNode
{
//Node Properties
public List<string> History { get; set; }
public List<string> Player1Cards { get; set; }
public List<string> Player2Cards { get; set; }
public float ReachProbabiltyP1 { get; set; }
public float ReachProbabiltyP2 { get; set; }
public Player ActivePlayer { get; set; }
public List<string> ActionOptions { get; set; }
public GameStateNode() { }
//Create an initial node
public static GameStateNode GetStartingNode(List<string> history,List<string> player1Cards, List<string> player2Cards)
{
GameStateNode newNode = new GameStateNode();
//Initiate new node properties
newNode.History = history;
newNode.Player1Cards = player1Cards;
newNode.Player2Cards = player2Cards;
newNode.ReachProbabiltyP1 = 1;
newNode.ReachProbabiltyP2 = 1;
newNode.ActivePlayer = PokerRules.GetActivePlayer(newNod.History);
//Set Action Options
if (newNode.ActivePlayer == Player.Player1 || newNod.ActivePlayer == Player.Player2)
{
newNode.ActionOptions = PokerRules.AvailablePlayerAction(newNode.History);
}
else if (newNode.ActivePlayer == Player.ChancePublic)
{
newNode.ActionOptions = PokerRules.AvailableChanceAction(newNode.History, newNode.Player1Cards, newNod.Player2Cards);
}
return newNode;
}
//Constructor for creating child node from parent node
public GameStateNode(GameStateNode parentGameState, string action,float actionProbability)
{
//Initiate node properties
this.History = new List<string>(parentGameState.History);
this.History.Add(action);
this.Player1Cards = new List<string>(parentGameStat.Player1Cards);
this.Player2Cards = new List<string>(parentGameStat.Player2Cards);
this.ActivePlayer = PokerRules.GetActivePlayer(this.History);
this.ReachProbabiltyP1 = parentGameState.ReachProbabiltyP1;
this.ReachProbabiltyP2 = parentGameState.ReachProbabiltyP2;
//Update Reach Probabilities
if (parentGameState.ActivePlayer == Player.Player1)
{
this.ReachProbabiltyP1 = parentGameState.ReachProbabiltyP1 *actionProbability;
}
else if (parentGameState.ActivePlayer == Player.Player2)
{
this.ReachProbabiltyP2 = parentGameState.ReachProbabiltyP2 *actionProbability;
}
//Set Action Options
if (this.ActivePlayer == Player.Player1 || this.ActivePlayer ==Player.Player2)
{
this.ActionOptions = PokerRules.AvailablePlayerAction(History);
}
else if (this.ActivePlayer == Player.ChancePublic)
{
this.ActionOptions = PokerRules.AvailableChanceAction(History, Player1Cards, Player2Cards);
}
}
}
The game state node class used to create node objects, these node objects will contain all the information needs for any node within the game tree.
The history property will contain the complete action history after the players whole cards. The history is stored in a list of strings where each string is a 2 character code referencing either a player action or a chance action. The active player property lets us know who’s decision it is at a current node, this could be player 1, player 2 or the chance player. The GetStartingNode method is used to initialise a starting node to begin a game tree, note the reach probabilities are 1 at the start as their is 100% probability of reaching this node. A constructor is used to create a child node from a parent node and update the relevant properties this includes changing the active player and updating the reach probability for the previous player by multiplying it with the action taken probability.
public class VanillaCFRTrainer
{
public Dictionary<string, InformationSet> InfoSetMap { get; set; }
public BestResponseUtility BestResponseUtility { get; set; }
public InformationSetCFRLogic InformationSetMethods { get; set; }
public int Iteration { get; set; }
public Player UpdatingPlayer { get; set; }
public VanillaCFRTrainer()
{
InfoSetMap = new Dictionary<string, InformationSet>();
}
public InformationSet GetInformationSet(GameStateNode gameStateNode)
{
List<string> infoSetHistory = new List<string>(gameStateNod.History);
if (gameStateNode.ActivePlayer == Player.Player1)
{
infoSetHistory.InsertRange(0, gameStateNode.Player1Cards);
}
else if (gameStateNode.ActivePlayer == Player.Player2)
{
infoSetHistory.InsertRange(0, gameStateNode.Player2Cards);
}
string key = string.Join("_", infoSetHistory);
if (InfoSetMap.ContainsKey(key) == false)
{
InfoSetMap[key] = new InformationSet(gameStateNod.ActionOptions.Count);
}
return InfoSetMap[key];
}
public float Train(int numberIterations, List<string> boardArranged,List<string[]> handCombosP1, List<string[]> handCombosP2)
{
//Train
}
//Returns utility from player 1 perspective
public float CalculateNodeUtility(GameStateNode gameStateNode)
{
//Calculate Node Utility
}
}
The CFR trainer class uses a dictionary property called InfoSetMap to store all the information sets within the game tree, this is initialised in the constructor.
When an information set is need to be retrieved the GetInformationSet method can be called, this creates a unique key specific to the information set consisting of the full history that is available to the active player. for example “As_Ad_R0” contains all the information needed to identify the information set; the As and Ad (Ace of spades and Ace of diamonds) show player 2s whole cards, player 1s whole cards are not included as they are not know to player 2, the “R0” shows that the action that has taken place is a single raise from player 1. The Train and CFR methods are included below.
public float Train(int numberIterations, List<string> boardArrangedList<string[]> handCombosP1, List<string[]> handCombosP2)
{
InformationSetMethods = new InformationSetCFRLogic();
BestResponseUtility = new BestResponseUtility(InfoSetMapInformationSetMethods);
Iteration = 0
float P1Util = 0;
int utilP1Count = 0
List<string> startHistory = new List<string>(boardArranged)
for (int i = 0; i < numberIterations; i++)
{
Iteration = i;
UpdatingPlayer = (Player) (i % 2)
P1Util = 0;
utilP1Count = 0
//Iterate through all possible hand combinations
for (int indexP1 = 0; indexP1 < handCombosP1.Count; index++)
{
//Dont include p1 hands that conflict with the board
if (boardArranged.Contains(handCombosP2[indexP1][0]) |boardArranged.Contains(handCombosP2[indexP1][1]))
{
continue;
}
for (int indexP2 = 0; indexP2 < handCombosP2.CountindexP2++)
{
//Dont include p2 hands that conflict with curren phands
if (handCombosP2[indexP2].Contai(handCombosP1[indexP1][0]) |handCombosP2[indexP2].Contai(handCombosP1[indexP1][1]))
{
continue;
}
//Dont include p2 hands that conflict with the board
if(boardArranged.Contains(handCombosP2[indexP2][0]|| boardArranged.Contai(handCombosP2[indexP2][1]))
{
continue;
}
//Initialise startNode
GameStateNode startNode = GameStateNo.GetStartingNode(startHistoryhandCombosP1[indexP1].ToList()handCombosP2[indexP2].ToList())
//Begin the CFR Recursion
P1Util += CalculateNodeUtility(startNode);
utilP1Count++;
}
Console.WriteLine($"Iteration {i} complete.");
Console.WriteLine($"Strategy Exploitability Percentage{BestResponseUtility.TotalDeviation(boardArrangedhandCombosP1, handCombosP2)}");
Console.WriteLine();
//return player 1 utility of last iteration
return P1Util / utilP1Count;
}
The train method first initialised some properties, The BestResonseUtility object is a something I will explain in full detail in a future article, for now we just need to know that it is used to calculate the exploitability of player 1 and player 2, and then return the deviation of this from the expected value. A Nash Equilibrium strategy will have zero deviation. So we want to see this converge towards zero.
The method then iterates for the specified number of iterations. Each iteration the method will go through all the possible player 1 and player 2 whole card combinations “dealing” the cards to each player, this is essentially propagating through the first chance nodes of the game tree. For each combination of whole cards a start node is created and passed into the CalculateNodeUtility method which will begin recursively propagating through the game tree.
public float CalculateNodeUtility(GameStateNode gameStateNode)
{
///// TERMINAL NODE /////
if (gameStateNode.ActivePlayer == Player.GameEnd)
{
var u = PokerRules.CalculatePayoff(gameStateNode.HistorygameStateNode.Player1Cards, gameStateNode.Player2Cards)
return u;
}
float[] actionUtilities = new float[gameStateNode.ActionOptio.Count];
float nodeUtility
///// CHANCE NODE /////
if (gameStateNode.ActivePlayer == Player.ChancePublic)
{
float actionProbability = (float)1 / gameStateNo.ActionOptions.Count
List<Task<float>> tasks = new List<Task<float>>()
//get utility of each action
for (int i = 0; i < actionUtilities.Length; i++)
{
GameStateNode childGameState = new GameStateNo(gameStateNode, gameStateNode.ActionOptions[i]actionProbability);
actionUtilities[i] = CalculateNodeUtility(childGameSta);
}
//average utility for node calculated by Dot product actioutilities and action probabilities
nodeUtility = actionUtilities.Select(u => u actionProbability).Sum()
return nodeUtility;
///// DECISION NODE /////
else
{
float activePlayerReachProbability
if (gameStateNode.ActivePlayer == Player.Player1)
{
activePlayerReachProbability = gameStateNo.ReachProbabiltyP1;
}
else //ActivePlayer == Player2
{
activePlayerReachProbability = gameStateNo.ReachProbabiltyP2;
}
InformationSet infoSet = GetInformationSet(gameStateNode)
var strategy = InformationSetMethods.GetStrategy(infoSet)
//Only update on updating player
if (gameStateNode.ActivePlayer == UpdatingPlayer)
{
InformationSetMethods.AddToStrategySum(infoSet, strate, activePlayerReachProbability);
}
//get utility of each action
for (int i = 0; i < actionUtilities.Length; i++)
{
var actionProbability = strategy[i];
GameStateNode childGameState = new GameStateNo(gameStateNode, gameStateNode.ActionOptions[i]actionProbability);
actionUtilities[i] = CalculateNodeUtility(childGameSta);
}
//average utility for node calculated by Dot product actioutilities and action probabilities
nodeUtility = actionUtilities.Zip(strategy, (x, y) => x *).Sum()
//Only update on updating player
if (gameStateNode.ActivePlayer == UpdatingPlayer)
{
InformationSetMethods.AddToCumulativeRegrets(infoSetgameStateNode, actionUtilities, nodeUtility);
}
return nodeUtility;
}
}
This method will propagate through the game tree passing back the nodes utility at each node. The node utility returned is from a player 1 perspective, player 2 utility would simply be the pot size minus player 1’s utility.
At a terminal node, the payoff for player 1 is calculated and returned.
At a chance node, the utility of each action is calculated, the node utility is then calculated from the dot product of action utilities and action probabilities (the actions being the actions from the chance player, aka what cards are dealt)
At a decision node, the node utility is calculated from the dot product of action utilities and action probabilities. The strategy used is also added to the information sets strategy sum, and the regrets are calculated and added to the cumulative regrets.
Note: I’ve made the algorithm use “alternating updates” which means only a single player is updated per iteration, this will actually reduce performance. However, it will have the effect of making hands with the same strategic ranking have the same strategy (eg AsAd will have the same strategy as AcAh) which can simplify things when we implement the strategy.
internal class Program
{
static void Main(string[] args)
{
//Game Inputs
int numIterations = 50;
List<string> board = new List<string>() { "2d", "2s", "2c", "2h", "3d"};
List<string> player1Range = new List<string>() { "AA", "KK", "QQ"};
List<string> player2Range = new List<string>() { "AA", "KK", "QQ"};
int startPotSize = 50;
int effectiveStackSize = 100;
List<float> availableBetSizes = new List<float>() { (float)0.5 };
//set pot size, effective stacks and player actions
PokerRules.SetStart(startPotSize, effectiveStackSize);
PlayerAction.SetPlayerActions(availableBetSizes);
//arrange board and hand combos
List<string> boardArranged = Card.ArrangeCards(board);
List<string[]> P1HandCombos = new List<string[]>();
List<string[]> P2HandCombos = new List<string[]>();
foreach (string hand in player1Range)
{
P1HandCombos.AddRange(Card.GetArrangedHandCombos(hand));
}
foreach (string hand in player2Range)
{
P2HandCombos.AddRange(Card.GetArrangedHandCombos(hand));
}
//create CFRTrainer object
VanillaCFRTrainer trainer = new VanillaCFRTrainer();
//Train Trainer
float avgUtil = trainer.Train(numIterations, boardArranged, P1HandCombos, P2HandCombos);
Console.WriteLine($"Player 1 Utility: {avgUtil}");
InfoSetUI.View(trainer, board.Count);
}
}
Now all that’s left is to run the code.
Given the inefficiencies in the code and the algorithm we will need to simplify the game we are trying to solve if we want it to run in a reasonable time. We will solve a river scenario where the board run out is 2d, 2s, 2c, 2h, 3d and use hand ranges for player 1 and player 2 of Aces, Kings and Queens. We will set the pot size to 50, stack sizes to 100 and set available raise sizes to a single option of a half pot size raise.
As you can see the exploitability deviation tends towards zero, showing that the solution is tending towards a Nash equilibrium.
The current strategy used for each iteration shoots around wildly and does not converge, this is because the regret sum is heavily influenced by the regret of the last iteration. However if we look at the average strategy we can see that it does converge and this is our Nash equilibrium strategy.
The above image shows the CLI results output of part of the solved Nash equilibrium strategy. The action options are F0 (Fold), C0 (Check or Call depending on context) and R0 (Raise or Bet). So this image is the strategy for the first player to act on the river, their options are Check or Bet. As expected due to the alternating updates the strategies for all card combinations with equal strategic value have the same strategy. We can see that Aces are nearly always bet and Kings are nearly always checked, Queens employ a mixed strategy betting around 28% of the time. This makes intuitive sense as if the player was only betting their Aces they would become exploitable thus bluffing with Queens on occasion balances their betting range.
When 2nd player to act faces a check they employ a very similar strategy, always bet Aces, nearly always Check Kings and a mixed strategy with Queens.
The image above shows the strategy for player 2 after player 1 has made a bet. If we think about what we should do when facing a bet in this scenario; If we have Queens we can beat nothing - so should always fold. If we have Aces we cannot lose - so should always Raise. If we have Kings we can sometimes win if they’re bluffing - so thus should employ a mixed strategy. This is what we see in our algorithm results, giving us some validation the model is giving us sensible results.