

Author: Baxter J.
Publisher: Springer Publishing Company
ISSN: 0885-6125
Source: Machine Learning, Vol.40, Iss.3, 2000-09, pp. : 243-263
Disclaimer: Any content in publications that violate the sovereignty, the constitution or regulations of the PRC is not accepted or approved by CNPIEC.
Abstract
In this paper we present TDLEAF(), a variation on the TD() algorithm that enables it to be used in conjunction with game-tree search. We present some experiments in which our chess program “KnightCap” used TDLEAF() to learn its evaluation function while playing on Internet chess servers. The main success we report is that KnightCap improved from a 1650 rating to a 2150 rating in just 308 games and 3 days of play. As a reference, a rating of 1650 corresponds to about level B human play (on a scale from E (1000) to A (1800)), while 2150 is human master level. We discuss some of the reasons for this success, principle among them being the use of on-line, rather than self-play. We also investigate whether TDLEAF() can yield better results in the domain of backgammon, where TD() has previously yielded striking success.
Related content


Learning to Predict by the Methods of Temporal Differences
By Sutton R.S.
Machine Learning, Vol. 03, Iss. 1, 1988-08 ,pp. :


Learning long-term chess strategies from databases
Machine Learning, Vol. 63, Iss. 3, 2006-06 ,pp. :


LEARNING SPATIO-TEMPORAL RELATIONAL STRUCTURES
By Bischof Walter F. Caelli Terry
Applied Artificial Intelligence, Vol. 15, Iss. 8, 2001-09 ,pp. :


Practical Issues in Temporal Difference Learning
By Tesauro G.
Machine Learning, Vol. 8, Iss. 3-4, 1992-05 ,pp. :


AN ANALYSIS OF EXPERIENCE REPLAY IN TEMPORAL DIFFERENCE LEARNING
Cybernetics and Systems, Vol. 30, Iss. 5, 1999-07 ,pp. :