We present “Transcriber” a tool for assisting in the creation of speech corpora, and describe some aspects of its development and use. Transcriber was designed for the manual segmentation and transcription of long duration broadcast news recordings, including annotation of speech turns, topics and acoustic conditions. It is highly portable, relying on the scripting language Tcl/Tk with extensions such as Snack for advanced audio functions and tcLex for lexical analysis, and has been tested on various Unix systems and Windows. The data format follows the XML standard with Unicode support for multilingual transcriptions. Distributed as free software in order to encourage the production of corpora, ease their sharing, increase user feedback and motivate software contributions, Transcriber has been in use for over a year in several countries. As a result of this collective experience, new requirements arose to support additional data formats, video control, and a better management of conversational speech. Using the annotation graphs framework recently formalized, adaptation of the tool towards new tasks and support of different data formats will become easier.

References in zbMATH (referenced in 4 articles , 1 standard article )

Showing results 1 to 4 of 4.
Sorted by year (citations)

  1. Devillers, Laurence; Vidrascu, Laurence; Lamel, Lori: Challenges in real-life emotion annotation and machine learning based detection (2005) ioport
  2. Lee, Chang Ha; Kanungo, Tapas: The architecture of TrueViz: A groundTRUth/metadata editing and VIsualiZing ToolKit. (2003) ioport
  3. Barras, Claude; Geoffrois, Edouard; Wu, Zhibiao; Liberman, Mark: Transcriber: Development and use of a tool for assisting speech corpora production (2001)
  4. Jacobson, Michel; Michailovsky, Boyd; Lowe, John B.: Linguistic documents synchronizing sound and text (2001)

Further publications can be found at: