Wellington Corpus of Spoken New Zealand English information
The Wellington Corpus of Spoken New Zealand English is a one-million-word corpus of transcribed English compiled from materials collected between 1988 and 1994, which is made up of excerpts from a range of speakers who have lived in New Zealand since before the age of 10. The corpus was collected under the direction of linguist Janet Holmes and includes broadcast transcripts as well as informal conversations, telephone conversations, lectures, and oral history interviews.[1]
The corpus, which was distributed as part of the 1999 ICAME CD-ROM, has been used for a number of academic studies including those looking at morphology,[2] pronoun use[3] and language contact studies, as of the influence of Māori on NZ English.[4][5]
^Janet Holmes, Bernadette Vine and Gary Johnson, and Bernadette Vine (1998). "Wellington Corpus". Retrieved May 28, 2015.{{cite web}}: CS1 maint: multiple names: authors list (link)
^Hundt, Marianne (1998). New Zealand English Grammar: Fact or Fiction. John Bengjamins.
^Holmes, Janet (1998). "Generic pronouns in the Wellington Corpus of Spoken New Zealand English". Kōtare: New Zealand Notes & Queries.
^Macalister, John (2006). "The Maori presence in the New Zealand English lexicon, 1850–2000: Evidence from a corpus-based study". English World-Wide.
^Macalister, John (1999). "Trends in New Zealand English: Some Observations on the Presence of Maori Words in the Lexicon". New Zealand English Journal.
and 29 Related for: Wellington Corpus of Spoken New Zealand English information
The Oxford EnglishCorpus (OEC) is a text corpusof 21st-century English, used by the makers of the Oxford English Dictionary and by Oxford University...
Standard Corpusof Present-Day American English, better known as simply the Brown Corpus, is an electronic collection of text samples of American English, the...
The Corpusof Contemporary American English (COCA) is a one-billion-word corpusof contemporary American English. It was created by Mark Davies, retired...
The Enron Corpus is a database of over 600,000 emails generated by 158 employees of the Enron Corporation in the years leading up to the company's collapse...
ofspoken data using material from radio, TV and informal conversations. The Bank ofEnglish totals 650 million running words. Copies of the corpus are...
British National Corpus (BNC) is a 100-million-word text corpusof samples of written and spokenEnglish from a wide range of sources. The corpus covers British...
The Cambridge International Corpus (CIC) is a collection of over 800 million words of real spoken and written English . The texts are stored in a database...
own national or regional variety ofEnglish. Each ICE corpus consists of one million words ofspoken and written English produced after 1989. For most participating...
Arabic Corpus (Arabic: المدونة القرآنية العربية, romanized: al-modwana al-Qurʾāni al-ʿArabiyya) is an annotated linguistic resource consisting of 77,430...
The Switchboard Telephone Speech Corpus is a corpusofspokenEnglish language consisted of almost 260 hours of speech. It was created in 1990 by Texas...
The American National Corpus (ANC) is a text corpusof American English containing 22 million words of written and spoken data produced since 1990. Currently...
TIMIT is a corpusof phonemically and lexically transcribed speech of American English speakers of different sexes and dialects. Each transcribed element...
analysis of an electronic corpusof contemporary text, the Collins Corpus, later leading to the development of the Bank ofEnglish, and the production of the...
a number of similarly structured corpora: the LOB Corpus (1960s British English), Kolhapur (Indian English), Wellington (NewZealandEnglish), Australian...
Scottish Corpusof Texts & Speech (SCOTS) is an ongoing project to build a corpusof modern-day (post-1940) written and spoken texts in Scottish English and...
The Europarl Corpus is a corpus (set of documents) that consists of the proceedings of the European Parliament from 1996 to 2012. In its first release...
National Corpus (CNC) (Czech : Český národní korpus) is a large electronic corpusof written and spoken Czech language, developed by the Institute of the Czech...
used as part of a larger corpus for training speech recognition systems. The package contains the following: 1813 .wav files containing spoken utterances...
The Bergen Corpusof London Teenage Language (COLT) is a data set of samples ofspokenEnglish that was compiled in 1993 from tape recorded and transcribed...
is a corpus that is annotated with verbal propositions and their arguments—a "proposition bank". Although "PropBank" refers to a specific corpus produced...
it as Tatoeba. In September 2007, about 150,000 English-Japanese sentence pairs from the Tanaka Corpus — a public-domain compilation released in 2001 by...
National Corpus, Brown Corpus, Cambridge Academic EnglishCorpus and Cambridge Learner Corpus, CHILDES corpora of child language, OpenSubtitles (a set of 60...
Speech Corpus is a Modern Persian speech corpus for speech synthesis. The corpus contains phonetic and orthographic transcriptions of about 2.5 hours of Persian...
The Buckeye Corpusof conversational speech is a speech corpus created by a team of linguists and psychologists at Ohio State University led by Prof....
The TenTen Corpus Family (also called TenTen corpora) is a set of comparable web text corpora, i.e. collections of texts that have been crawled from the...