R-MAX
R-MAX is a very simple model-based reinforcement learning algorithm which can attain near-optimal average reward in polynomial time. In R-MAX, the agent always maintains a complete, but possibly inaccurate model of its environment and acts based on the optimal policy derived from this model. The model is initialized in an optimistic fashion: all actions in all states return the maximal possible reward (hence the name). During execution, it is updated based on the agent’s observations. R-MAX improves upon several previous algorithms: (1) It is simpler and more general than Kearns and Singh’s E^3 algorithm, covering zero-sum stochastic games. (2) It has a built-in mechanism for resolving the exploration vs. exploitation dilemma. (3) It formally justifies the “optimism under uncertainty” bias used in many RL algorithms. (4) It is simpler, more general, and more efficient than Brafman and Tennenholtz’s LSG algorithm for learning in single controller stochastic games. (5) It generalizes the algorithm by Monderer and Tennenholtz for learning in repeated games. (6) It is the only algorithm for learning in repeated games, to date, which is provably efficient, considerably improving and simplifying previous algorithms by Banos and by Megiddo.
Keywords for this software
References in zbMATH (referenced in 32 articles , 1 standard article )
Showing results 1 to 20 of 32.
Sorted by year (- Albrecht, Stefano V.; Crandall, Jacob W.; Ramamoorthy, Subramanian: Belief and truth in hypothesised behaviours (2016)
- Chen, Xin; Chen, Gang; Cao, Weihua; Wu, Min: Cooperative learning with joint state value approximation for multi-agent systems (2013)
- Crandall, Jacob W.; Goodrich, Michael A.: Learning to compete, coordinate, and cooperate in repeated games using reinforcement learning (2011)
- Farahmand, Amir-massoud; Szepesvári, Csaba: Model selection in reinforcement learning (2011)
- Li, Lihong; Littman, Michael L.; Walsh, Thomas J.; Strehl, Alexander L.: Knows what it knows: a framework for self-aware learning (2011)
- Veness, J.; Ng, K. S.; Hutter, M.; Uther, W.; Silver, David: A Monte-Carlo AIXI approximation (2011)
- Hans, Alexander; Udluft, Steffen: Uncertainty propagation for efficient exploration in reinforcement learning (2010)
- Jaksch, Thomas; Ortner, Ronald; Auer, Peter: Near-optimal regret bounds for reinforcement learning (2010)
- Li, Lihong; Littman, Michael L.: Reducing reinforcement learning to KWIK online regression (2010)
- Ortega, Pedro A.; Braun, Daniel A.: A minimum relative entropy principle for learning and acting (2010)
- Szepesvári, Csaba: Algorithms for reinforcement learning. (2010)
- Whiteson, Shimon: Adaptive representations for reinforcement learning. (2010)
- Brunskill, Emma; Leffler, Bethany R.; Li, Lihong; Littman, Michael L.; Roy, Nicholas: Provably efficient learning with typed parametric models (2009)
- Strehl, Alexander L.; Li, Lihong; Littman, Michael L.: Reinforcement learning in finite MDPs: PAC analysis (2009)
- Yu, Jia Yuan; Mannor, Shie; Shimkin, Nahum: Markov decision processes with arbitrary reward processes (2009)
- Bab, Avraham; Brafman, Ronen I.: Multi-agent reinforcement learning in common interest and fixed sum stochastic games: an experimental study (2008)
- Ryabko, Daniil; Hutter, Marcus: On the possibility of learning in reactive environments with arbitrary dependence (2008)
- Strehl, Alexander L.; Littman, Michael L.: An analysis of model-based interval estimation for Markov decision processes (2008)
- Främling, Kary: Guiding exploration by pre-existing knowledge without modifying reward (2007)
- Sandholm, Tuomas: Perspectives on multiagent learning (2007)