

Author: Pollack J.B.
Publisher: Springer Publishing Company
ISSN: 0885-6125
Source: Machine Learning, Vol.32, Iss.3, 1998-09, pp. : 225-240
Disclaimer: Any content in publications that violate the sovereignty, the constitution or regulations of the PRC is not accepted or approved by CNPIEC.
Abstract
Following Tesauro's work on TD-Gammon, we used a 4,000 parameter feedforward neural network to develop a competitive backgammon evaluation function. Play proceeds by a roll of the dice, application of the network to all legal moves, and selection of the position with the highest evaluation. However, no backpropagation, reinforcement or temporal difference learning methods were employed. Instead we apply simple hillclimbing in a relative fitness environment. We start with an initial champion of all zero weights and proceed simply by playing the current champion network against a slightly mutated challenger and changing weights if the challenger wins. Surprisingly, this worked rather well. We investigate how the peculiar dynamics of this domain enabled a previously discarded weak method to succeed, by preventing suboptimal equilibria in a “meta-game” of self-learning.
Related content


Comments on “Co-Evolution in the Successful Learning of Backgammon Strategy”
By Tesauro G.
Machine Learning, Vol. 32, Iss. 3, 1998-09 ,pp. :




Understanding component co-evolution with a study on Linux
By Yu Liguo
Empirical Software Engineering, Vol. 12, Iss. 2, 2007-04 ,pp. :

