Gold standard datasets for evaluating word sense disambiguation programs

Author: Kilgarriff A.  

Publisher: Academic Press

ISSN: 0885-2308

Source: Computer Speech & Language, Vol.12, Iss.4, 1998-10, pp. : 453-472

Disclaimer: Any content in publications that violate the sovereignty, the constitution or regulations of the PRC is not accepted or approved by CNPIEC.

Previous Menu Next

Abstract

There are now many computer programs for automatically determining the sense in which a word is being used. One would like to be able to say which are better, which worse, and also which words, or varieties of language, present particular problems to which algorithms. An evaluation exercise is required, and such an exercise requires a “gold standard” dataset of correct answers. Producing this proves to be a difficult and challenging task. In this paper I discuss the background, challenges and strategies, and present a detailed methodology for ensuring that the gold standard is not fool's gold.