Global Information Lookup Global Information

Corpus language information


A corpus language is a language that has no living speakers but for which numerous records produced by its native speakers survive.[1] Examples of corpus languages are Ancient Greek, Latin, the Egyptian Language, Old English and Elamite.

Some corpus languages left a very large corpus, such as Ancient Greek and Latin, and therefore can be fully reconstructed, even though some details of the pronunciation may be unclear. Such languages can be used even today, as is the case with Sanskrit and Latin. Others have such a limited corpus that some important words, e.g. some pronouns, are not found in the corpus. Examples for this are Ugaritic and Gothic. Languages that are only attested by a few words, often names, and a few phrases (called Trümmersprachen in German linguistics, literally "rubble languages") can only be reconstructed in a very limited way and often their genetic relationship to other languages remains unclear. Examples are the Lombardic language and Dadanitic, a Semitic language that may be close to classical Arabic.

Corpus languages are studied using the methods of corpus linguistics, but corpus linguistics can be used (and is commonly used) for the study of the writings and other records of living languages.

Not all extinct languages are "corpus languages," since there are many extinct languages in which few or no writings or other records survive.

  1. ^ Langslow, D.R. 2002 "Approaching bilingualism in corpus languages" in James Noel Adams, Mark Janse, Simon Swain (edd.) Bilingualism in Ancient Society: Language Contact and the Written Text Oxford: OUP.

and 26 Related for: Corpus language information

Request time (Page generated in 0.8952 seconds.)

Corpus language

Last Update:

corpus language is a language that has no living speakers but for which numerous records produced by its native speakers survive. Examples of corpus languages...

Word Count : 259

Corpus linguistics

Last Update:

Corpus linguistics is an empirical method for the study of language by way of a text corpus (plural corpora). Corpora are balanced, often stratified collections...

Word Count : 2576

Text corpus

Last Update:

natural language processing, a corpus (pl.: corpora) or text corpus is a dataset, consisting of natively digital and older, digitalized, language resources...

Word Count : 879

Feast of Corpus Christi

Last Update:

The Feast of Corpus Christi (Ecclesiastical Latin: Dies Sanctissimi Corporis et Sanguinis Domini Iesu Christi, lit. 'Day of the Most Holy Body and Blood...

Word Count : 4335

Croatian Language Corpus

Last Update:

Croatian Language Corpus (CLC) (Croatian: Hrvatski jezični korpus, HJK) is a corpus of Croatian compiled at the Institute of Croatian Language and Linguistics...

Word Count : 481

Brown Corpus

Last Update:

everyday language use. Compiled by Henry Kučera and W. Nelson Francis at Brown University, in Rhode Island, it is a general language corpus containing...

Word Count : 1056

Habeas corpus

Last Update:

Habeas corpus (/ˈheɪbiəs ˈkɔːrpəs/ ; from Medieval Latin, lit. 'that you have the body') is a recourse in law by which a report can be made to a court...

Word Count : 9431

Extinct language

Last Update:

have undergone significant language change may be considered "extinct", especially in cases where they did not leave a corpus of literature or liturgy that...

Word Count : 2995

BookCorpus

Last Update:

It was the main corpus used to train the initial GPT model by OpenAI, and has been used as training data for other early large language models including...

Word Count : 362

Most common words in English

Last Update:

Oxford English Corpus (OEC), a massive text corpus that is written in the English language. In total, the texts in the Oxford English Corpus contain more...

Word Count : 858

Marathi language

Last Update:

data for fine-tuning. Text Corpus and Corpus Linguistics show how texts, sentences, or words from written or spoken language have changed over time or...

Word Count : 8195

Speech corpus

Last Update:

A speech corpus (or spoken corpus) is a database of speech audio files and text transcriptions. In speech technology, speech corpora are used, among other...

Word Count : 474

Language model

Last Update:

evaluating language processing systems. These include: Corpus of Linguistic Acceptability GLUE benchmark Microsoft Research Paraphrase Corpus Multi-Genre...

Word Count : 2293

Oxford English Corpus

Last Update:

Corpus (OEC) is a text corpus of 21st-century English, used by the makers of the Oxford English Dictionary and by Oxford University Press' language research...

Word Count : 345

Language planning

Last Update:

structure of the language. Corpus planning activities often arise as the result of beliefs about the adequacy of the form of a language to serve desired...

Word Count : 4941

Corpus Juris Civilis

Last Update:

The Corpus Juris (or Iuris) Civilis ("Body of Civil Law") is the modern name for a collection of fundamental works in jurisprudence, enacted from 529 to...

Word Count : 2716

Enron Corpus

Last Update:

The Enron Corpus is a database of over 600,000 emails generated by 158 employees of the Enron Corporation in the years leading up to the company's collapse...

Word Count : 712

Corpus Christi Professor of Latin

Last Update:

The Corpus Christi Professorship of the Latin Language and Literature, also known simply as the Corpus Christi Professorship of Latin and previously as...

Word Count : 225

Cambridge English Corpus

Last Update:

International Corpus is used to inform Cambridge University Press English Language Teaching publications as well as for research in corpus linguistics....

Word Count : 1016

Large language model

Last Update:

constructed Internet-scale language datasets ("web as corpus"), upon which they trained statistical language models. In 2009, in most language processing tasks...

Word Count : 11955

Perplexity

Last Update:

{\displaystyle q={\tilde {p}}} . In natural language processing, a corpus is a set of sentences or texts, and a language model is a probability distribution over...

Word Count : 1846

Corpus Clock

Last Update:

The Corpus Clock, also known as the Grasshopper clock, is a large sculptural clock at street level on the outside of the Taylor Library at Corpus Christi...

Word Count : 1349

Luwian language

Last Update:

has media related to Luwian language. "Digital etymological-philological Dictionary of the Ancient Anatolian Corpus Languages (eDiAna)". Ludwig-Maximilians-Universität...

Word Count : 4512

Language contact

Last Update:

use of multiple languages in a single conversation. Methods from sociolinguistics (the study of language use in society), from corpus linguistics and...

Word Count : 1552

British National Corpus

Last Update:

computational linguists whose goal was a corpus of modern (at the time of building the corpus), naturally occurring language in the form of speech and text or...

Word Count : 3894

Corpus manager

Last Update:

effective searching in corpora. A corpus manager usually represents a complex tool that allows one to perform searches for language forms or sequences. It may...

Word Count : 418

PDF Search Engine © AllGlobal.net