Tournament Poker Bot — Research Notes

Companion Note II · Mid-Stakes Series

Independent Chip Model Considerations in Automated Decision Policy

The Independent Chip Model — the canonical mapping from tournament chip stacks to expected monetary equity — modifies the objective function that any agent should be optimising at any moment after the money bubble. The present note examines how this modification interacts with the decision policies observed in the mid-stakes agent cohort, and where the interaction breaks down.

Background

ICM expresses the intuition that the marginal value of a chip is not constant. A chip won is worth less, in expected monetary terms, than a chip lost — a consequence of the bounded, top-heavy payout schedule that tournaments impose. The size of this asymmetry is a function of stack distribution, payout structure, and remaining field. Cash-game decision frameworks, which implicitly assume linear chip-value, do not encode this asymmetry.

For an automated agent whose decision policy was derived from cash-equivalent objectives — by far the dominant case in the observed cohort — the absence of ICM compensation produces a specific, predictable distortion: aggression is over-weighted in spots where ICM pressure should compress calling and re-shoving ranges.

Methodology

For each agent-operated hand played after the money bubble, the chosen action was compared against the action recommended by a reference solver carrying ICM-aware payoff substitutions. The reference solver used the actual remaining stack distribution at the time of the hand and the published payout schedule for that event. Deviation magnitude was measured in units of recommended-frequency difference, aggregated by spot category.

Findings

Calling-range over-extension

In medium-stack-versus-medium-stack confrontations near the bubble, the agent cohort called all-in shoves at frequencies materially above the ICM-aware reference. The mean over-extension was substantial enough to convert several otherwise-positive late-stage spots into expected losses once the chip-to-equity transform was applied.

Short-stack re-shoving

Conversely, short-stack re-shoving frequencies were close to the ICM-aware reference. This is consistent with short-stack play being dominated by Nash-equilibrium push-fold tables that are not strongly ICM-sensitive at the shortest depths and are widely tabulated in published material that pre-training corpora are likely to include.

Spot categoryAgent freq.ICM-aware ref.Deviation
Mid-stack call vs cover, bubble34%19%+15pp
Mid-stack 3-bet shove, bubble11%6%+5pp
Short-stack open shove (≤10bb)27%26%+1pp
Big-stack iso vs short, ITM22%31%−9pp

Big-stack underutilisation

Where ICM pressure favours the chip leader applying pressure to middling stacks, the agent cohort under-applied that pressure. Isolation-raise frequencies against short and medium stacks were below the reference. The leverage afforded by chip-leader status, in other words, was not being converted into incremental equity.

Discussion

The pattern is consistent with a policy trained or specified for an environment in which chip-value is linear. In such an environment, the observed agent frequencies would be approximately correct. In the tournament environment, they are systematically miscalibrated in the direction that an ICM-naive policy would predict: too willing to commit stack near the bubble, insufficiently exploitative when ahead in chips.

It is worth noting that adding ICM-aware payoff substitution to an existing decision pipeline is not, on its face, computationally expensive. The persistence of the gap across the observation window suggests that the bottleneck is engineering attention rather than computational tractability.

Observation The most expensive single category of error in the cohort, measured in monetary equity surrendered, is the bubble-stage mid-stack call against a covering shove. This single spot accounts for an estimated 28% of the late-stage ROI gap.

Related

The companion note on late-stage tournament dynamics treats the broader class of structural events of which the bubble is one instance.

Per-spot deviation tables and the reference-solver configuration are available on request.

Request a seat allocation

Notes

  1. Reference-solver substitution follows the standard Malmuth–Harville formulation; alternative formulations (Roberts, Landrum–Burns) yield qualitatively similar deviation signs across the categories reported.
  2. Frequencies are aggregated across platforms and stake levels within the $5–$100 buy-in band; per-platform breakdowns preserve the directional pattern.