Education

Vanilla CFR - How to Make a Poker AI Part 2

August 8, 2024

Vanilla CFR (Counterfactual Regret Minimisation) is the simplest form of CFR, it is the bare bones algorithm which we will build upon in future articles. In this article we will follow the theory needed for this algorithm to give us a Nash equilibrium strategy.

Counterfactual…

Counterfactual regret minimisation uses regret matching as described in part 0, however it also uses a counter factual probability weighting for the regrets. When we use the term counterfactual it means we consider the probability of all opposing players, but not the active player.

…Regret Minimisation

For our algorithm to be regret minimising we need the average regret to approach zero as the algorithm iterations approaches infinity. This can then form an objective for the algorithm:

Get the overall average regret to approach zero

Regrets

Full counterfactual regret is the maximum player regret at an information set, I, considering all the possible strategies - this includes all possible action decisions at all the information sets that are reachable from Information Set I (including I).

Full counterfactual regret

D(I) is all the information sets reachable from I, meaning σ|D(I)→σ’ is the alternate strategy applied to all reachable information sets. In words this equation is the maximum full regret at an information set, weighted by the counterfactual probability and averaged across algorithm iterations.

As you can imagine it would be very tedious to calculate the utility of all possible alternate strategies when those strategies have to include the actions at all the reachable information sets. luckily it has been proven that the full counterfactual regret is less than the sum of the positive Immediate counterfactual regrets across all reachable child information sets. The proof is beyond the scope of this article, but can be read in source [1].

Theorem linking Full and Immediate counterfactual regret

Immediate counterfactual regret is the same as full counter factual regret BUT only considering the strategy remains the same except for the action at the current information set. As you can imagine this makes it a lot easier to calculate alternate strategy utilities when only a single action is altered.

Immediate counterfactual regret

Our original goal was to minimise the overall average regret towards zero, with this new information our algorithms new objective can now be consider to be to be:

Get the average Immediate positive counterfactual regret of each information set to approach zero.

Strategies

This objective is achieved by choosing our strategies with a favourability towards actions with larger positive regrets (Regret Matching), thus attempting to minimise said positive regrets. The next iteration strategy is therefor chosen by normalising the positive counterfactual regret sum at the information set.

Next iteration strategy

Our algorithm is now successfully regret minimising! And as the average regret approaches zero the average strategy will approach Nash equilibrium. The average strategy is the average of strategies weighted by their reach probability. This weighting is applied to ensure strategies that are more likely to occur will be favourably weighted.

Average strategy

We now have all the theory needed to build our first CFR algorithm that will give us a Nash equilibrium strategy! which is exactly what we will do in the next article.

Related articles

Browse all articles