Access to large Internet corpora

Following the development, in the Scuola Superiore di Lingue Moderne per Interpreti e Traduttori, Universita di Bologna, and the University of Leeds Centre for Translation Studies, of techniques for harvesting and automatically part-of-speech annotating large corpora from the Internet, 100-120 million word corpora in all project languages (except Catalan) are now freely available online via a single interface.

A number of these corpora have already been used when creating training materials in the Corpus Linguistics for Translators course that an important deliverable of the MeLLANGE project.

The monolingual and multilingual, multi-million-word sets of data are publicly available via http://corpus.leeds.ac.uk/list.html. The corpora allow translation students to check the behaviour of source and target language expressions, via both concordances and collocation statistics.

Did you know that ...

Copyright MeLLANGE 2007
For more information about the MeLLANGE project, visit the project website.