Global Information Lookup Global Information

Bijankhan Corpus information


Bijankhan Corpus Logo

The Bijankhan corpus (Persian: پیکرهٔ بی‌جن‌خان) is a tagged corpus that is suitable for natural language processing (NLP) research on the Persian language. This collection is gathered from daily news and common texts. In this collection all documents are categorized into different subjects such as political, cultural, etc.; in about 4300 different subject categories. The corpus contains about 2.6 million manually tagged words with a tag set that contains 550 Persian part-of-speech tags.

The Bijankhan corpus was created by the Database Research Group at the University of Tehran.[1] The corpus is non-free in that it is not free for commercial use, although these restrictions vary by country. The Bijankhan corpus is named after Mahmood Bijankhan, professor of linguistics at the University of Tehran due to his contributions in this area.

  1. ^ "Database Research Group". Archived from the original on 2017-05-15. Retrieved 2016-12-25.

and 30 Related for: Bijankhan Corpus information

Request time (Page generated in 0.7917 seconds.)

Bijankhan Corpus

Last Update:

The Bijankhan corpus (Persian: پیکرهٔ بی‌جن‌خان) is a tagged corpus that is suitable for natural language processing (NLP) research on the Persian language...

Word Count : 161

Enron Corpus

Last Update:

The Enron Corpus is a database of over 600,000 emails generated by 158 employees of the Enron Corporation in the years leading up to the company's collapse...

Word Count : 712

Brown Corpus

Last Update:

The Brown University Standard Corpus of Present-Day American English, better known as simply the Brown Corpus, is an electronic collection of text samples...

Word Count : 1056

Mahmood Bijankhan

Last Update:

University of Tehran. He is the creator of Bijankhan Corpus and a winner of Khwarizmi International Award. Bijankhan received his BSc in applied mathematics...

Word Count : 296

British National Corpus

Last Update:

British National Corpus (BNC) is a 100-million-word text corpus of samples of written and spoken English from a wide range of sources. The corpus covers British...

Word Count : 3894

Cambridge English Corpus

Last Update:

The Cambridge International Corpus (CIC) is a collection of over 800 million words of real spoken and written English . The texts are stored in a database...

Word Count : 1016

Thesaurus Linguae Graecae

Last Update:

lemmatization of the Greek corpus (2006) – a substantial undertaking, given the highly inflected nature of Greek and the complexity of the corpus, covering more than...

Word Count : 596

COBUILD

Last Update:

have been the creation and analysis of an electronic corpus of contemporary text, the Collins Corpus, later leading to the development of the Bank of English...

Word Count : 175

Quranic Arabic Corpus

Last Update:

The Quranic Arabic Corpus (Arabic: المدونة القرآنية العربية, romanized: al-modwana al-Qurʾāni al-ʿArabiyya) is an annotated linguistic resource consisting...

Word Count : 599

Corpus of Contemporary American English

Last Update:

The Corpus of Contemporary American English (COCA) is a one-billion-word corpus of contemporary American English. It was created by Mark Davies, retired...

Word Count : 1135

Oxford English Corpus

Last Update:

The Oxford English Corpus (OEC) is a text corpus of 21st-century English, used by the makers of the Oxford English Dictionary and by Oxford University...

Word Count : 345

Europarl Corpus

Last Update:

The Europarl Corpus is a corpus (set of documents) that consists of the proceedings of the European Parliament from 1996 to 2012. In its first release...

Word Count : 800

Tatoeba

Last Update:

September 2007, about 150,000 English-Japanese sentence pairs from the Tanaka Corpus — a public-domain compilation released in 2001 by Hyogo University professor...

Word Count : 2056

Sketch Engine

Last Update:

Sketch Engine is a corpus manager and text analysis software developed by Lexical Computing since 2003. Its purpose is to enable people studying language...

Word Count : 1419

Arabic Speech Corpus

Last Update:

The Arabic Speech Corpus is a Modern Standard Arabic (MSA) speech corpus for speech synthesis. The corpus contains phonetic and orthographic transcriptions...

Word Count : 388

Spoken English Corpus

Last Update:

Spoken English Corpus (SEC) is a speech corpus collection of recordings of spoken British English compiled during 1984–1987. The corpus manual can be found...

Word Count : 1278

TIMIT

Last Update:

TIMIT is a corpus of phonemically and lexically transcribed speech of American English speakers of different sexes and dialects. Each transcribed element...

Word Count : 561

Bank of English

Last Update:

English (BoE) is a representative subset of the 4.5 billion words COBUILD corpus, a collection of English texts. These are mainly British in origin, but...

Word Count : 147

Slovenian National Corpus

Last Update:

Slovenian National Corpus FidaPLUS is the 621 million words (tokens) corpus of the Slovenian language, gathered from selected texts written in Slovenian...

Word Count : 168

PropBank

Last Update:

is a corpus that is annotated with verbal propositions and their arguments—a "proposition bank". Although "PropBank" refers to a specific corpus produced...

Word Count : 377

Tehran Monolingual Corpus

Last Update:

of Tehran. The corpus is free for research use, after obtaining permission from the corpus aggregator. Bijankhan Corpus Hamshahri Corpus TMC description...

Word Count : 128

Russian National Corpus

Last Update:

The Russian National Corpus (Russian: Национальный корпус русского языка, lit. 'National Corpus of the Russian language') is a corpus of the Russian language...

Word Count : 379

Czech National Corpus

Last Update:

The Czech National Corpus (CNC) (Czech : Český národní korpus) is a large electronic corpus of written and spoken Czech language, developed by the Institute...

Word Count : 476

German Reference Corpus

Last Update:

The German Reference Corpus (original: Deutsches Referenzkorpus; short: DeReKo) is an electronic archive of text corpora of contemporary written German...

Word Count : 537

Buckeye Corpus

Last Update:

The Buckeye Corpus of conversational speech is a speech corpus created by a team of linguists and psychologists at Ohio State University led by Prof. Mark...

Word Count : 315

Scottish Corpus of Texts and Speech

Last Update:

The Scottish Corpus of Texts & Speech (SCOTS) is an ongoing project to build a corpus of modern-day (post-1940) written and spoken texts in Scottish English...

Word Count : 349

Croatian Language Corpus

Last Update:

The Croatian Language Corpus (CLC) (Croatian: Hrvatski jezični korpus, HJK) is a corpus of Croatian compiled at the Institute of Croatian Language and...

Word Count : 481

Bergen Corpus of London Teenage Language

Last Update:

The Bergen Corpus of London Teenage Language (COLT) is a data set of samples of spoken English that was compiled in 1993 from tape recorded and transcribed...

Word Count : 361

Hamshahri Corpus

Last Update:

tasks). The corpus is available for download in XML format. Bijankhan Corpus Persian Today Corpus Tehran Monolingual Corpus Text corpus Information retrieval...

Word Count : 333

Wellington Corpus of Spoken New Zealand English

Last Update:

The Wellington Corpus of Spoken New Zealand English is a one-million-word corpus of transcribed English compiled from materials collected between 1988...

Word Count : 234

PDF Search Engine © AllGlobal.net