This script lists top words distributionally similar to the word in the query (words are said to be distributionally similar, if they share a significant amount of collocates in the corpus; for instance, tea is in this sense similar to coffee (with the similarity rate of 0.756), drink (0.496), lunch (0.478), etc):

   Corpus: BNC German RRC (with Latin transliteration)

The complete Russian thesaurus is available as a single file (in UTF-8 encoding).

The calculation of semantic classes uses the Singular Value Decomposition method (it is based on a database computation procedure developed by Reinhard Rapp) and finds a cluster (i.e. words which semantic classes also contain words similar to the original word). You can control the size of intersection between semantic classes of the two words to consider them as belonging to the same cluster.

The interface was developed by Serge Sharoff; contact me at, if you have further queries.