A collection of English corpora

  (Select English tags)
BNC Reuters British News Internet New York Times Wikipedia Brown Corpus ukWac Tweet-misinfo Telegram-misinfo
   CQP syntax only (Examples)   Getting help on the query interface
Centre for Translation Studies
Centre for Translation Studies

Set parameters of your query


The corpora listed above:
  1. BNC, a classic 100MW corpus,
  2. A corpus of British News, a collection of news stories from 2004 from each of the four major British newspapers: Guardian/Observer, Independent, Telegraph and Times, 200 million words.
  3. I-EN, a 150MW Internet corpus collected by Serge Sharoff using random queries to Google, see http://wackybook.sslmit.unibo.it
  4. the Reuters corpus, a collection of newswires from Reuters for one year from 1996-08-20 to 1997-08-19, 90 million words.
  5. UK-WAC, a 2GW corpus of English UK webpages collected by Marco Baroni and his colleagues (it's huge; handle this corpus with care),
  6. BASE, British Academic Spoken English, collected by Hilary Nesi and colleagues at Coventry University

    The interface has been designed by Serge Sharoff, University of Leeds. The source code is hosted under the GPL open source license on http://csar.sourceforge.net