Regret minimization in games with incomplete information
In particular, we introduce the notion of counterfactual regret, which exploits the degree of incomplete information in an extensive game. We show how minimizing counterfactual regret minimizes overall regret, and therefore in self-play can be used to compute a Nash equilibrium.
Requests for name changes in the electronic proceedings will be accepted with no questions asked. Quite a bit is known about minimizing different kinds of regret in experts problems, and how these regret types relate to types of equilibria in the multiagent setting of repeated matrix games. Much less is known about the possible kinds of regret in online convex programming problems OCPs , or abo Abstract - Cited by 18 4 self - Add to MetaCart Quite a bit is known about minimizing different kinds of regret in experts problems, and how these regret types relate to types of equilibria in the multiagent setting of repeated matrix games.
Much less is known about the possible kinds of regret in online convex programming problems OCPs , or about equilibria in the analogous multiagent setting of repeated convex games. This gap is unfortunate, since convex games are much more expressive than matrix games, and since many important machine learning problems can be expressed as OCPs. In this paper, we work to close this gap: we analyze a spectrum of regret types which lie between external and swap regret, along with their corresponding equilibria, which lie between coarse correlated and correlated equilibrium.
We also analyze algorithms for minimizing these regret types. As examples of our framework, we derive algorithms for learning correlated equilibria in polyhedral convex games and extensive-form correlated equilibria in extensive-form games. The former is exponentially more efficient than previous algorithms, and the latter is the first of its type. Regret matching is a widely-used algorithm for learning how to act.
We begin by proving that regrets on actions in one setting game can be transferred to warm start the regrets for solving a different setting with same structure but differ-ent payoffs that can be written as a function of parameter Abstract - Cited by 9 4 self - Add to MetaCart Regret matching is a widely-used algorithm for learning how to act.
We begin by proving that regrets on actions in one setting game can be transferred to warm start the regrets for solving a different setting with same structure but differ-ent payoffs that can be written as a function of parameters. We prove how this can be done by carefully discounting the prior regrets. This provides, to our knowledge, the first prin-cipled warm-starting method for no-regret learning.
It also extends to warm-starting the widely-adopted counterfactual regret minimization CFR algorithm for large incomplete-information games; we show this experimentally as well. We then study optimizing a parameter vector for a player in a two-player zero-sum game e.
We propose a custom gradient descent algo-rithm that provably finds a locally optimal parameter vector while leveraging our warm-start theory to significantly save regret-matching iterations at each step.
It optimizes the pa-rameter vector while simultaneously finding an equilibrium. This amounts to the first action abstraction algorithm algorithm for selecting a small number of discrete actions to use from a continuum of actions—a key preprocessing step for solving large games us-ing current equilibrium-finding algorithms with convergence guarantees for extensive-form games.
Recently, poker has emerged as a popular domain for investigating decision problems under conditions of uncertainty. Unlike traditional games such as checkers and chess, poker exhibits imperfect information, varying utilities, and stochastic events.
Because of these complications, decisions at the Abstract - Cited by 5 0 self - Add to MetaCart Recently, poker has emerged as a popular domain for investigating decision problems under conditions of uncertainty.
Because of these complications, decisions at the poker table are more analogous to the decisions faced by humans in everyday life. In this dissertation, we investigate regret minimization in extensive-form games and apply our work in developing champion computer poker agents. Counterfactual Regret Minimization CFR is the current state-of-the-art approach to computing capable strategy profiles for large extensive-form games.
Our primary focus is to advance our understanding and application of CFR in domains with. Hierarchical abstraction, distributed equilibrium. The leading approach for solving large imperfect-information games is automated abstraction followed by running an equilibrium-finding algorithm. The paper proves that Theorem 3 ,. The paper Monte Carlo Sampling for Regret Minimization in Extensive Games shows we can sample from the game tree and estimate the regrets.
Then we get sampled counterfactual value fro block j ,. Therefore we can sample a part of the game tree and calculate the regrets. We calculate an estimate of regrets. Whether it's a terminal history; i. Utility of player i for a terminal history. Get current player, denoted by P h , where P is known as Player function.
Something like dealing cards, or opening common cards in poker. Whether the next step is a chance step; something like dealing a new card. We do chance sampling CS where all the chance events nodes are sampled and all other events nodes are explored. We can ignore the term q z since it's the same for all terminal histories since we are doing chance sampling and it cancels out when calculating strategy common in numerator and denominator.
0コメント