Monte Carlo refers to an algorithm that utilises randomisation to gain a sample that is an accurate representation of the full population. The name refers to the Casino de Monte-Carlo; a legendary icon of gambling.
Vanilla CFR requires full traversal of the game tree, where every possible choice of every possible node is explored. If we instead could sample parts of this game tree where choices are chosen on their probability of occurring then we can get a representation of the game tree through multiple iterations. This should then converge just as with Vanilla CFR.
There are multiple different approaches to sampling and all will sample different parts of the game tree with different methods. In this article I will be using external sampling, this method of sampling is effective and also simple to implement.
External sampling means all actions that are made by the opponent or chance player are sampled. For example the algorithm is at an opponent decision node in the game tree, the opponents strategy is Raise 30%, Call 60% and Fold 10%, the algorithm will randomly choose one of these actions with probability equal to the action probability and then only explore the game tree of that action. The same thing is done with chance nodes, except with chance nodes each card has an equal probability of being chosen.
To implement external sampling will only require minor changes to the Vanilla CFR implementation, we just have to randomly pick an option when at a public node or an opponent node. This results in the following changes to our code from last weeks Vanilla CFR algorithm:
The changed methods can be seen below.
public float Train(int numberIterations, List<string> boardArranged, List<string[]> handCombosP1, List<string[]> handCombosP2)
{
InformationSetMethods = new InformationSetCFRLogic();
BestResponseUtility = new BestResponseUtility(InfoSetMap, InformationSetMethods);
float P1Util = 0;
int utilP1Count = 0;
List<string> startHistory = new List<string>(boardArranged);
for (int i = 0; i < numberIterations; i++)
{
Iteration = i;
UpdatingPlayer = (Player)(i % 2);
SampleCards:
//Randomly choose player 1 and player 2 hands
int indexP1 = rand.Next(0, handCombosP1.Count);
int indexP2 = rand.Next(0, handCombosP2.Count);
//Resample hands that conflict with the board or eachother
if (IsCardConflict(boardArranged, handCombosP1[indexP1], handCombosP2[indexP2]))
{
goto SampleCards;
}
//Initialise startNode
GameStateNode startNode = GameStateNode.GetStartingNode(startHistory, handCombosP1[indexP1].ToList(), handCombosP2[indexP2].ToList());
//Begin the CFR Recursion
P1Util += CalculateNodeUtilityMC(startNode);
utilP1Count++;
//Output exploitability every 100 iterations
if(i%100==0)
{
Console.WriteLine($"Iteration MC {i} complete.");
Console.WriteLine($"Strategy Exploitability Percentage MC: {BestResponseUtility.TotalDeviation(boardArranged, handCombosP1, handCombosP2)}");
}
}
//return average player 1 utility
return P1Util / utilP1Count;
}
public float CalculateNodeUtilityMC(GameStateNode gameStateNode)
{
///// TERMINAL NODE /////
if (gameStateNode.ActivePlayer == Player.GameEnd)
{
var u = PokerRules.CalculatePayoff(gameStateNode.History, gameStateNode.Player1Cards, gameStateNode.Player2Cards);
return u;
}
float[] actionUtilities = new float[gameStateNode.ActionOptions.Count];
float nodeUtility;
///// CHANCE NODE /////
if (gameStateNode.ActivePlayer == Player.ChancePublic)
{
//Randomly choose card
int cardChoice = rand.Next(gameStateNode.ActionOptions.Count);
float actionProbability = (float)1 / gameStateNode.ActionOptions.Count;
GameStateNode childGameState = new GameStateNode(gameStateNode, gameStateNode.ActionOptions[cardChoice], actionProbability);
nodeUtility = CalculateNodeUtilityMC(childGameState);
return nodeUtility;
}
InformationSet infoSet = GetInformationSet(gameStateNode);
var strategy = InformationSetMethods.GetStrategy(infoSet);
///// UPDATING PLAYER DECISION NODE /////
if (gameStateNode.ActivePlayer == UpdatingPlayer)
{
float activePlayerReachProbability = gameStateNode.ReachProbabiltyP1;
if (gameStateNode.ActivePlayer == Player.Player2){ activePlayerReachProbability = gameStateNode.ReachProbabiltyP2; }
InformationSetMethods.AddToStrategySum(infoSet, strategy, activePlayerReachProbability);
//get utility of each action
for (int i = 0; i < actionUtilities.Length; i++)
{
var actionProbability = strategy[i];
GameStateNode childGameState = new GameStateNode(gameStateNode, gameStateNode.ActionOptions[i], actionProbability);
actionUtilities[i] = CalculateNodeUtilityMC(childGameState);
}
//average utility for node calculated by Dot product action utilities and action probabilities
nodeUtility = actionUtilities.Zip(strategy, (x, y) => x * y).Sum();
InformationSetMethods.AddToCumulativeRegrets(infoSet, gameStateNode, actionUtilities, nodeUtility);
}
///// OPPONENT DECISION NODE /////
else
{
//Randomly choose action
int strategyChoice = RandomActionSelection(strategy);
GameStateNode childGameState = new GameStateNode(gameStateNode, gameStateNode.ActionOptions[strategyChoice], strategy[strategyChoice]);
nodeUtility = CalculateNodeUtilityMC(childGameState);
}
return nodeUtility;
}
Comparing these algorithms poses a small problem as Monte Carlo CFR requires far more iterations, but at the same time each iteration takes far less time. In the following charts we use Nodes Touched on the x-axis, this should be approximately proportional to the computational time and allows us to compare the two algorithms.
As can be seen on the following chart, by sampling we have have been able to get the Monte Carlo CFR algorithm to converge much quicker than the Vanilla CFR algorithm.