|
Use of corpora in translation studies |
![]() Centre for Translation Studies |
The Russian corpus is based on articles from Izvestia, a national broadsheet newspaper, and covers the period from 2000 to 2001. The POS tagging and lemmatisation of the corpus has been done using mystem.
The language of Russian newspapers can be compared against the first version of the Russian Reference Corpus, which consists of about 50 million words and represents a variety of genres in Russian. The Russian Reference Corpus was also used as the basis for development of the frequency dictionary of modern Russian, its description and information for download is available from a separate page.
The size of the corpora is summarised in the following table:
| Corpus | Size(in words) |
| Reuters subset | 83,491,119 |
| Izvestia | 14,564,884 |
| Russian Reference Corpus | 50,512,584 |
The interface will allow you to compare word uses between English and Russian as well as across two registers in Russian (in the language of newspapers vs. the language of fiction). Even though the size of the corpora varies, the first line of the output shows the relative frequency of your search term in the corpus you have selected (in terms of the number of occurrences of the term per million words).
The use of the corpus is restricted for research purposes only. Because of the nature of our agreement with Reuters we have to monitor the users of their subcorpus. This requires free registration for interested users.
Click
here to fill the registration form. If you experience problems
with filling the form, contact Serge Sharoff, s.sharoff
leeds.ac.uk.
Click here to enter the corpus. Please, send your comments, suggestions and criticisms to Serge Sharoff, s.sharoff
leeds.ac.uk.