Switchboard Telephone Speech Corpus information

The Switchboard Telephone Speech Corpus is a corpus of spoken English language consisted of almost 260 hours of speech. It was created in 1990 by Texas Instruments via a DARPA grant, and released in 1992 by NIST. The corpus contains 2,400 telephone conversations among 543 US speakers (302 male, 241 female).^[1]^[2]^[3] Participants did not know each other, and conversations were held on topics from a predetermined list.^[4]

Switchboard-2 Phase II was collected in 1999 and includes "4,472 five-minute telephone conversations involving 679 participants".^[5]

The corpus was used for development of speech recognition algorithms.^[6]

Text example:^[7]

A: All right um well [laughter-uh] let's see i'm twenty
B: How old are you Lisa. Okay that i'm older
A: Yeah how old are you. Older [laughter]
B: Older than you [laughter-are]
A: [laughter-okay]
B: Okay we are supposed to talk about places we like to go so i'm gonna and where are you from where are you calling from?
A: I'm calling from uh Provo Utah but I'm from Plano Texas
B: Oh you are from Plano my sister lives in Plano yes her husband is the new Director of Admissions at uh University of Texas at Dallas
A: Oh really. Oh wow my dad used to work at UTD also
B: Yeah so I [vocalized-noise]. Anyway so where's your favorite place to go?
A: Um. Generally we just go on family vacations to Arizona my grandparents live there that's generally our usual summer vacation

^ "Switchboard-1 Release 2 - Linguistic Data Consortium". catalog.ldc.upenn.edu. Retrieved 26 January 2024.
^ "Papers with Code - Switchboard-1 Corpus Dataset". paperswithcode.com. Retrieved 26 January 2024.
^ Godfrey, John J.; Holliman, Edward C.; McDaniel, Jane (23 March 1992). "SWITCHBOARD: Telephone speech corpus for research and development". [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing. IEEE Computer Society. pp. 517–520. doi:10.1109/ICASSP.1992.225858. ISBN 0-7803-0532-9. S2CID 61412708. Retrieved 26 January 2024.
^ "NXT Swbd Overview". groups.inf.ed.ac.uk. Retrieved 26 January 2024.
^ "Switchboard-2 Phase II - Linguistic Data Consortium". catalog.ldc.upenn.edu. Retrieved 26 January 2024.
^ "Switchboard Transcription System". www1.icsi.berkeley.edu. Retrieved 26 January 2024.
^ Soni, Mayank; Spillane, Brendan; Gilmartin, Emer; Saam, Christian; Cowan, Benjamin R.; Wade, Vincent (2021). "An Empirical Study of Topic Transition in Dialogue". arXiv:2111.14188 [cs.CL].

[1] "Switchboard-1 Release 2 - Linguistic Data Consortium". catalog.ldc.upenn.edu. Retrieved 26 January 2024.

[2] "Papers with Code - Switchboard-1 Corpus Dataset". paperswithcode.com. Retrieved 26 January 2024.

[3] Godfrey, John J.; Holliman, Edward C.; McDaniel, Jane (23 March 1992). "SWITCHBOARD: Telephone speech corpus for research and development". [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing. IEEE Computer Society. pp. 517–520. doi:10.1109/ICASSP.1992.225858. ISBN 0-7803-0532-9. S2CID 61412708. Retrieved 26 January 2024.

[4] "NXT Swbd Overview". groups.inf.ed.ac.uk. Retrieved 26 January 2024.

[5] "Switchboard-2 Phase II - Linguistic Data Consortium". catalog.ldc.upenn.edu. Retrieved 26 January 2024.

[6] "Switchboard Transcription System". www1.icsi.berkeley.edu. Retrieved 26 January 2024.

[7] Soni, Mayank; Spillane, Brendan; Gilmartin, Emer; Saam, Christian; Cowan, Benjamin R.; Wade, Vincent (2021). "An Empirical Study of Topic Transition in Dialogue". arXiv:2111.14188 [cs.CL].

Switchboard Telephone Speech Corpus information

and 29 Related for: Switchboard Telephone Speech Corpus information

Switchboard Telephone Speech Corpus

Enron Corpus

Brown Corpus

Corpus of Contemporary American English

Speech recognition

British National Corpus

Cambridge English Corpus

Arabic Speech Corpus

Quranic Arabic Corpus

Oxford English Corpus

TIMIT

Sketch Engine

PropBank

Spoken English Corpus

Persian Speech Corpus

International Corpus of English

American National Corpus

Bank of English

TenTen Corpus Family

Wellington Corpus of Spoken New Zealand English

Buckeye Corpus

PCVC Speech Dataset

Bergen Corpus of London Teenage Language

Scottish Corpus of Texts and Speech

Thesaurus Linguae Graecae

Tatoeba

Bijankhan Corpus

Europarl Corpus

CHILDES