Bijankhan Corpus information

The Bijankhan corpus (Persian: پیکرهٔ بی‌جن‌خان) is a tagged corpus that is suitable for natural language processing (NLP) research on the Persian language. This collection is gathered from daily news and common texts. In this collection all documents are categorized into different subjects such as political, cultural, etc.; in about 4300 different subject categories. The corpus contains about 2.6 million manually tagged words with a tag set that contains 550 Persian part-of-speech tags.

The Bijankhan corpus was created by the Database Research Group at the University of Tehran.^[1] The corpus is non-free in that it is not free for commercial use, although these restrictions vary by country. The Bijankhan corpus is named after Mahmood Bijankhan, professor of linguistics at the University of Tehran due to his contributions in this area.

^ "Database Research Group". Archived from the original on 2017-05-15. Retrieved 2016-12-25.

[1] "Database Research Group". Archived from the original on 2017-05-15. Retrieved 2016-12-25.

Bijankhan Corpus information

and 30 Related for: Bijankhan Corpus information

Bijankhan Corpus

Enron Corpus

Brown Corpus

Mahmood Bijankhan

British National Corpus

Cambridge English Corpus

Thesaurus Linguae Graecae

COBUILD

Quranic Arabic Corpus

Corpus of Contemporary American English

Oxford English Corpus

Europarl Corpus

Tatoeba

Sketch Engine

Arabic Speech Corpus

Spoken English Corpus

TIMIT

Bank of English

Slovenian National Corpus

PropBank

Tehran Monolingual Corpus

Russian National Corpus

Czech National Corpus

German Reference Corpus

Buckeye Corpus

Scottish Corpus of Texts and Speech

Croatian Language Corpus

Bergen Corpus of London Teenage Language

Hamshahri Corpus

Wellington Corpus of Spoken New Zealand English