The Wenzhou Spoken Corpus

Author： Newman John Lin Jingxia Butler Terry Zhang Eric

ISSN： 1755-1676

Source： Corpora, Vol.2, Iss.1, 2007-05, pp. : 97-109

Disclaimer: Any content in publications that violate the sovereignty, the constitution or regulations of the PRC is not accepted or approved by CNPIEC.

Previous Menu Next

Abstract

Abstract The creation of the Wenzhou Spoken Corpus, an online searchable corpus of a modern Chinese dialect, presents a number of challenges that are of interest to the corpus linguistic community. We review issues involved with collection of spoken data, its transcription and markup, as well as the functionality of the search tools. The transcription makes use of Chinese characters as well as IPA symbols for Wenzhou colloquial forms not conventionally represented by characters. XML was adopted as the standard for the basic format of files, with file searches expressed in XPath form. The search tools provide the usual options of restricting searches by age, gender, etc., and yield concordances and tables of collocates. Though the collection of data for the corpus was ‘opportunistic’ in some ways, and so not ideally balanced or representative, it is nevertheless proving to be a valuable tool for corpus-based research on Wenzhou.

Related content

Tagging a Corpus of Spoken Swedish

By Nivre J. Grönqvist L.

International Journal of Corpus Linguistics, Vol. 6, Iss. 1, 2001-11 ,pp. : 47-78 (32)

John Benjamins Publishing Company

Access to resources Recommend Favorite

Exploiting a Large Spoken Corpus

By Berglund Ylva

International Journal of Corpus Linguistics, Vol. 4, Iss. 1, 1999-01 ,pp. : 29-52 (24)

John Benjamins Publishing Company

Access to resources Recommend Favorite

Tagging a Corpus of Spoken Swedish

By Nivre Joakim Grönqvist Leif

International Journal of Corpus Linguistics, Vol. 6, Iss. 1, 2001-01 ,pp. : 47-78 (32)

John Benjamins Publishing Company

Access to resources Recommend Favorite

Starting with Xhosa English … towards a spoken corpus

By de Klerk V.

International Journal of Corpus Linguistics, Vol. 7, Iss. 1, 2002-09 ,pp. : 21-42 (22)

John Benjamins Publishing Company

Access to resources Recommend Favorite

Corpus and Context. Investigating Pragmatic Functions in Spoken Discourse

By Aijmer Karin

International Journal of Corpus Linguistics, Vol. 14, Iss. 3, 2009-08 ,pp. : 419-425 (7)

John Benjamins Publishing Company

Access to resources Recommend Favorite