Branch of linguistics that studies language through examples contained in real texts
This article's use of external links may not follow Wikipedia's policies or guidelines. Please improve this article by removing excessive or inappropriate external links, and converting useful links where appropriate into footnote references.(June 2022) (Learn how and when to remove this message)
Corpus linguistics is an empirical method for the study of language by way of a text corpus (plural corpora).[1] Corpora are balanced, often stratified collections of authentic, "real world", text of speech or writing that aim to represent a given linguistic variety.[1] Today, corpora are generally machine-readable data collections.
Corpus linguistics proposes that a reliable analysis of a language is more feasible with corpora collected in the field—the natural context ("realia") of that language—with minimal experimental interference. Large collections of text, though corpora may also be small in terms of running words, allow linguists to run quantitative analyses on linguistic concepts that may be difficult to test in a qualitative manner.[2]
The text-corpus method uses the body of texts in any natural language to derive the set of abstract rules which govern that language. Those results can be used to explore the relationships between that subject language and other languages which have undergone a similar analysis. The first such corpora were manually derived from source texts, but now that work is automated.
Corpora have not only been used for linguistics research, they have since the 1969 been increasingly used to compile dictionaries (starting with The American Heritage Dictionary of the English Language in 1969) and reference grammars, with A Comprehensive Grammar of the English Language, published in 1985, as a first.
Experts in the field have differing views about the annotation of a corpus. These views range from John McHardy Sinclair, who advocates minimal annotation so texts speak for themselves,[3] to the Survey of English Usage team (University College, London), who advocate annotation as allowing greater linguistic understanding through rigorous recording.[4]
^ abMeyer, Charles F. (2023). English Corpus Linguistics (2nd ed.). Cambridge: Cambridge University Press. p. 4.
^Hunston, S. (1 January 2006), "Corpus Linguistics", in Brown, Keith (ed.), Encyclopedia of Language & Linguistics (Second Edition), Oxford: Elsevier, pp. 234–248, doi:10.1016/b0-08-044854-2/00944-5, ISBN 978-0-08-044854-1, retrieved 31 October 2023
^Sinclair, J. 'The automatic analysis of corpora', in Svartvik, J. (ed.) Directions in Corpus Linguistics (Proceedings of Nobel Symposium 82). Berlin: Mouton de Gruyter. 1992.
^Wallis, S. 'Annotation, Retrieval and Experimentation', in Meurman-Solin, A. & Nurmi, A.A. (ed.) Annotating Variation and Change. Helsinki: Varieng, [University of Helsinki]. 2007. e-Published
and 24 Related for: Corpus linguistics information
Corpuslinguistics is an empirical method for the study of language by way of a text corpus (plural corpora). Corpora are balanced, often stratified collections...
In linguistics and natural language processing, a corpus (pl.: corpora) or text corpus is a dataset, consisting of natively digital and older, digitalized...
Linguistics is the scientific study of language. Linguistics is based on a theoretical as well as a descriptive study of language and is also interlinked...
Zipf's law. Although the Brown Corpus pioneered the field of corpuslinguistics, by now typical corpora (such as the Corpus of Contemporary American English...
created by Mark Davies, retired professor of corpuslinguistics at Brigham Young University (BYU). The Corpus of Contemporary American English (COCA) is...
of spoken and written British English of that time. It is used in corpuslinguistics for analysis of corpora. The project to create the BNC involved the...
M. (1993). "Building a large annotated corpus of English: The Penn Treebank" (PDF). Computational Linguistics. 19 (2): 313–330. Archived (PDF) from the...
Law and corpuslinguistics (LCL) is an academic sub-discipline that uses large databases of examples of language usage equipped with tools designed by...
Journal Colombian Applied Linguistics Journal CorpusLinguistics and Linguistic Theory International Journal of CorpusLinguistics Archiv für das Studium...
English sources, instead of producing a new translation of the Qur'an. Corpuslinguistics Quran Classical Arabic Treebank K. Dukes, E. Atwell and N. Habash...
sociolinguistics (the study of language use in society), from corpuslinguistics and from formal linguistics are used in the study of language contact. The most...
Applied linguistics is a practical use of language. Applied linguistics is an interdisciplinary field. Major branches of applied linguistics include bilingualism...
noun recognition, natural language understanding and generation, corpuslinguistics, and machine translation. Chinese character Information Technology...
In corpuslinguistics, a collocation is a series of words or terms that co-occur more often than would be expected by chance. In phraseology, a collocation...
linguistic factors that place a discourse in context. Contrastive linguisticsCorpuslinguistics Dialectology Discourse analysis Grammar Interlinguistics Language...
linguistics. Models and theoretical accounts of cognitive linguistics are considered as psychologically real, and research in cognitive linguistics aims...
In the study of language, description or descriptive linguistics is the work of objectively analyzing and describing how language is actually used (or...
Structural linguistics, or structuralism, in linguistics, denotes schools or theories in which language is conceived as a self-contained, self-regulating...
In corpuslinguistics, a hapax legomenon (/ˈhæpəks lɪˈɡɒmɪnɒn/ also /ˈhæpæks/ or /ˈheɪpæks/; pl. hapax legomena; sometimes abbreviated to hapax, plural...
published papers. His main academic interests were English grammar, corpuslinguistics, stylistics, pragmatics, and semantics. Leech was born in Gloucester...
"type"+"writ"+"er", and "can"+"not"). Since the beginning of the study of linguistics, numerous attempts at defining what a word is have been made, with many...
be close to classical Arabic. Corpus languages are studied using the methods of corpuslinguistics, but corpuslinguistics can be used (and is commonly...