Global Information Lookup Global Information

Switchboard Telephone Speech Corpus information


The Switchboard Telephone Speech Corpus is a corpus of spoken English language consisted of almost 260 hours of speech. It was created in 1990 by Texas Instruments via a DARPA grant, and released in 1992 by NIST. The corpus contains 2,400 telephone conversations among 543 US speakers (302 male, 241 female).[1][2][3] Participants did not know each other, and conversations were held on topics from a predetermined list.[4]

Switchboard-2 Phase II was collected in 1999 and includes "4,472 five-minute telephone conversations involving 679 participants".[5]

The corpus was used for development of speech recognition algorithms.[6]

Text example:[7]

A: All right um well [laughter-uh] let's see i'm twenty
B: How old are you Lisa. Okay that i'm older
A: Yeah how old are you. Older [laughter]
B: Older than you [laughter-are]
A: [laughter-okay]
B: Okay we are supposed to talk about places we like to go so i'm gonna and where are you from where are you calling from?
A: I'm calling from uh Provo Utah but I'm from Plano Texas
B: Oh you are from Plano my sister lives in Plano yes her husband is the new Director of Admissions at uh University of Texas at Dallas
A: Oh really. Oh wow my dad used to work at UTD also
B: Yeah so I [vocalized-noise]. Anyway so where's your favorite place to go?
A: Um. Generally we just go on family vacations to Arizona my grandparents live there that's generally our usual summer vacation

  1. ^ "Switchboard-1 Release 2 - Linguistic Data Consortium". catalog.ldc.upenn.edu. Retrieved 26 January 2024.
  2. ^ "Papers with Code - Switchboard-1 Corpus Dataset". paperswithcode.com. Retrieved 26 January 2024.
  3. ^ Godfrey, John J.; Holliman, Edward C.; McDaniel, Jane (23 March 1992). "SWITCHBOARD: Telephone speech corpus for research and development". [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing. IEEE Computer Society. pp. 517–520. doi:10.1109/ICASSP.1992.225858. ISBN 0-7803-0532-9. S2CID 61412708. Retrieved 26 January 2024.
  4. ^ "NXT Swbd Overview". groups.inf.ed.ac.uk. Retrieved 26 January 2024.
  5. ^ "Switchboard-2 Phase II - Linguistic Data Consortium". catalog.ldc.upenn.edu. Retrieved 26 January 2024.
  6. ^ "Switchboard Transcription System". www1.icsi.berkeley.edu. Retrieved 26 January 2024.
  7. ^ Soni, Mayank; Spillane, Brendan; Gilmartin, Emer; Saam, Christian; Cowan, Benjamin R.; Wade, Vincent (2021). "An Empirical Study of Topic Transition in Dialogue". arXiv:2111.14188 [cs.CL].

and 29 Related for: Switchboard Telephone Speech Corpus information

Request time (Page generated in 0.86 seconds.)

Switchboard Telephone Speech Corpus

Last Update:

The Switchboard Telephone Speech Corpus is a corpus of spoken English language consisted of almost 260 hours of speech. It was created in 1990 by Texas...

Word Count : 453

Enron Corpus

Last Update:

The Enron Corpus is a database of over 600,000 emails generated by 158 employees of the Enron Corporation in the years leading up to the company's collapse...

Word Count : 712

Brown Corpus

Last Update:

The Brown University Standard Corpus of Present-Day American English, better known as simply the Brown Corpus, is an electronic collection of text samples...

Word Count : 1056

Corpus of Contemporary American English

Last Update:

user-defined part of speech) Note that the corpus is available only through the web interface, due to copyright restrictions. The corpus of Global Web-based...

Word Count : 1135

Speech recognition

Last Update:

University of Washington. EARS funded the collection of the Switchboard telephone speech corpus containing 260 hours of recorded conversations from over...

Word Count : 12459

British National Corpus

Last Update:

linguists whose goal was a corpus of modern (at the time of building the corpus), naturally occurring language in the form of speech and text or writing that...

Word Count : 3894

Cambridge English Corpus

Last Update:

conversations, telephone calls, radio broadcasts, presentations, speeches, meetings, TV programmes and lectures. The Cambridge Learner Corpus (CLC) is a collection...

Word Count : 1016

Arabic Speech Corpus

Last Update:

The Arabic Speech Corpus is a Modern Standard Arabic (MSA) speech corpus for speech synthesis. The corpus contains phonetic and orthographic transcriptions...

Word Count : 388

Quranic Arabic Corpus

Last Update:

supervised by Eric Atwell. The annotated corpus includes: A manually verified part-of-speech tagged Quranic Arabic corpus. An annotated treebank of Quranic Arabic...

Word Count : 599

Oxford English Corpus

Last Update:

The Oxford English Corpus (OEC) is a text corpus of 21st-century English, used by the makers of the Oxford English Dictionary and by Oxford University...

Word Count : 345

TIMIT

Last Update:

required for access to the dataset. The TIMIT telephone corpus was an early attempt to create a database with speech samples. It was published in the year 1988...

Word Count : 561

Sketch Engine

Last Update:

(show trending words) Corpus building and management – create corpora from the Web or uploaded texts including part-of-speech tagging and lemmatization...

Word Count : 1419

PropBank

Last Update:

is a corpus that is annotated with verbal propositions and their arguments—a "proposition bank". Although "PropBank" refers to a specific corpus produced...

Word Count : 377

Spoken English Corpus

Last Update:

Spoken English Corpus (SEC) is a speech corpus collection of recordings of spoken British English compiled during 1984–1987. The corpus manual can be found...

Word Count : 1278

Persian Speech Corpus

Last Update:

The Persian Speech Corpus is a Modern Persian speech corpus for speech synthesis. The corpus contains phonetic and orthographic transcriptions of about...

Word Count : 355

International Corpus of English

Last Update:

includes a part-of-speech tagging and parsing of the entire corpus. The treebank can be thoroughly searched and explored with the ICE Corpus Utility Program...

Word Count : 1229

American National Corpus

Last Update:

included in earlier corpora such as the British National Corpus. It is annotated for part of speech and lemma, shallow parse, and named entities. The ANC...

Word Count : 605

Bank of English

Last Update:

English (BoE) is a representative subset of the 4.5 billion words COBUILD corpus, a collection of English texts. These are mainly British in origin, but...

Word Count : 147

TenTen Corpus Family

Last Update:

processing procedures such as tokenization, part-of-speech tagging and word-sense disambiguation enrich corpus texts with detailed linguistic information. This...

Word Count : 1201

Wellington Corpus of Spoken New Zealand English

Last Update:

The corpus was collected under the direction of linguist Janet Holmes and includes broadcast transcripts as well as informal conversations, telephone conversations...

Word Count : 234

Buckeye Corpus

Last Update:

The Buckeye Corpus of conversational speech is a speech corpus created by a team of linguists and psychologists at Ohio State University led by Prof. Mark...

Word Count : 315

PCVC Speech Dataset

Last Update:

The PCVC (Persian Consonant Vowel Combination) Speech Dataset is a Modern Persian speech corpus for speech recognition and also speaker recognition. The...

Word Count : 377

Bergen Corpus of London Teenage Language

Last Update:

and 17 in schools throughout London, England. This corpus, which has been tagged for part of speech using the CLAWS 6 tagset, is one of the linguistic...

Word Count : 361

Scottish Corpus of Texts and Speech

Last Update:

The Scottish Corpus of Texts & Speech (SCOTS) is an ongoing project to build a corpus of modern-day (post-1940) written and spoken texts in Scottish English...

Word Count : 349

Thesaurus Linguae Graecae

Last Update:

lemmatization of the Greek corpus (2006) – a substantial undertaking, given the highly inflected nature of Greek and the complexity of the corpus, covering more than...

Word Count : 596

Tatoeba

Last Update:

Gu, Jiatao (9 June 2020). "CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus". arXiv:2002.01320 [cs.CL]. Wikimedia Commons has media related...

Word Count : 2056

Bijankhan Corpus

Last Update:

The corpus contains about 2.6 million manually tagged words with a tag set that contains 550 Persian part-of-speech tags. The Bijankhan corpus was created...

Word Count : 161

Europarl Corpus

Last Update:

The Europarl Corpus is a corpus (set of documents) that consists of the proceedings of the European Parliament from 1996 to 2012. In its first release...

Word Count : 800

CHILDES

Last Update:

The Child Language Data Exchange System (CHILDES) is a corpus established in 1984 by Brian MacWhinney and Catherine Snow to serve as a central repository...

Word Count : 521

PDF Search Engine © AllGlobal.net