Frequency lists for language learners
The Kelly project set out to develop principled language learning resources for many different language combinations using corpus information. We selected the most frequent words from representative corpora and selected them using pedagogically relevant principles following the Common European Framework of Reference forLanguages.
The University of Leeds worked on the lists for three languages: Arabic, Chinese and Russian), while other partners worked on English, Greek, Italian, Norwegian, Polish and Swedish.
The corpora for our languages were collected as a large snapshot of texts available for these languages on the Web, using technologies discussed in:
Serge Sharoff. Creating general-purpose corpora using automated search engine queries. In Marco Baroni and Silvia Bernardini, editors, WaCky! Working papers on the Web as Corpus, Gedit, Bologna, 2006. http://wackybook.sslmit.unibo.it
Frequency lists for Arabic, Chinese and Russian
All lists are in the MS Excel format with information on the word frequency in terms of ipm (instances per million of words; this gives a picture of the average frequency of words, no matter the size of an individual corpus used), as well as the CEFR level associated with a word (ranging from A1 for absolute beginners to C2 for advanced students).
An updated version of the Russian list is available as a dictionary from Routledge:
Sharoff, S., Umanskaya, E., Wilson, J. A Frequency Dictionary of Russian: core vocabulary for learners, Routledge, 2013.
In adition to the frequency list it also includes illustrative examples, their translations into English, as well as a list of the most common multiword expressions.
There is also a database interface, which can be used to explore the links between words selected for each of these languages. It gives you an idea how many basic meanings a word has in each language, and how the meanings vary between languages. For example,
The word tie is ambiguous in English, but the words γραβάτα, krawat, галстук in respectively Greek, Polish and Russian have only one basic meaning (translations shown in red are symmetrical to the source word shown in bold).
It was important to know how well our intuition about the relevance of the most frequent words in each language matches the experience of language learners. We have conducted an extensive set of evaluations using online surveys, see the statistics below.
- Arabic questionnaire responses
- Chinese questionnaire responses
- Russian questionnaire responses
(the summaries are available for language directions in which we got sufficient number of responses).