Tournament Poker Bot — Research Notes

Companion Note III · Mid-Stakes Series

Failure Modes in Late-Stage Tournament Play

Late-stage tournament dynamics expose a class of decision errors that are absent or muted in cash-game play. The principal failure modes identified in the present data are: (i) bubble-stage risk aversion, (ii) satellite-payout misclassification, and (iii) post-flop tree mis-pruning under deep stack-to-pot ratios. This note describes each in turn and reports the observed magnitude of equity surrendered in the mid-stakes agent cohort.

Bubble-stage risk aversion — but in the wrong direction

The bubble — the moment immediately preceding the first paid finishing position — is the canonical late-stage structural event. Survival equity at the bubble has been argued for since the earliest tournament-theory literature, with empirical confirmation from human cash-out distributions. An ICM-aware policy will compress calling ranges and selectively expand re-shoving ranges against opponents whose stack size makes them risk-averse.

In the cohort studied, the agent population did not compress calling ranges. The opposite is observed: calling frequencies in covered confrontations sat above the ICM-aware reference. The colloquial framing — "bots play scared on the bubble" — is, in this sample, inverted. Bot policy is too willing to call, not too unwilling. The risk aversion that human field-pros develop empirically is absent.

Satellite-payout misclassification

Satellite events introduce a discontinuous payout schedule: a fixed number of seats are awarded, and all finishing positions at or above the seat-allocation threshold receive identical equity. The marginal value of an extra chip, once survival to the threshold is approximately assured, is zero. A correctly calibrated policy plays a fold-everything strategy in the immediate run-up to the threshold for any non-trivially large stack.

The observed agent cohort, on satellite events flagged for this analysis, did not adopt fold-everything behaviour near the threshold. Pre-flop voluntary-put-in-pot percentages remained within a few points of mid-stage values, and post-flop continuation patterns remained essentially structure-blind. The policy treated the satellite payoff as if it were monotonically increasing — which it is not.

Mechanism The misclassification is consistent with a payout-schedule abstraction layer that maps tournament prize structures onto a single monotone function. Satellite schedules — flat above the threshold, zero below — do not fit that abstraction and are silently truncated to the closest monotone approximation.

Final-table ICM and pruning depth

At the final table, ICM curvature is at its most severe and stack distributions are at their most heterogeneous. Decisions taken at six- or seven-handed final tables with disparate stacks impose calculation demands that, in principle, require either a deeper solver pass or a richer set of memoised ICM tables than mid-stage spots.

The empirical observation is that the agent cohort exhibits a discontinuity in post-flop tree depth near the final table — pruning becomes more aggressive, evident from increased fold frequencies at decision nodes where a longer line was warranted. The behavioural signature is a population shift toward earlier decisions in the hand and a reduced presence of three-street value-bet sequences.

Bubble Pay-jump Final table finishing position →
Figure 1. Stylised payout-equity curve. Discontinuities at structural events disclose policy mis-calibration most sharply.

Aggregate effect on cohort ROI

Decomposing the late-stage ROI gap reported in the primary note, the three failure modes described above account for, in aggregate, approximately three quarters of the observed cohort underperformance from the bubble onward. Bubble-stage call-extension is the single largest contributor, followed by final-table pruning depth, with satellite misclassification accounting for the residual.

Implications

The failure modes are not cryptic; each follows from a specific structural feature of the tournament environment that a cash-derived policy does not encode. The persistence of these gaps across the four-year observation window is the more interesting finding. Closing them is a matter of policy engineering rather than algorithmic novelty.

The per-event decomposition and the cohort-level supplementary log are available on request.

Request a seat allocation

Notes

  1. Satellite analysis is restricted to events in which the seat-allocation structure was flat above threshold; mixed satellite-cash schedules were excluded to avoid contaminating the comparison.
  2. Pruning-depth measurements rely on post-flop action-sequence length distributions; sequence truncation at fold nodes is the observable signal.
  3. Companion treatment of the ICM mechanism is given in ICM and bot decision policy.