A collection of English corpora
(Select
English tags
)
BNC
Reuters
British News
British News
Internet
Brown Corpus
CQP syntax only (
Examples
)
Getting help on the query interface
Centre for Translation Studies
Set parameters of your query
Corpora
The corpora listed above:
BNC, a classic 100MW corpus,
A corpus of British News, a collection of news stories from 2004 from each of the four major British newspapers: Guardian/Observer, Independent, Telegraph and Times, 200 million words.
I-EN, a 150MW Internet corpus collected by Serge Sharoff using random queries to Google, see
http://wackybook.sslmit.unibo.it
the Reuters corpus
, a collection of newswires from Reuters for one year from 1996-08-20 to 1997-08-19, 90 million words.
UK-WAC, a 2GW corpus of English UK webpages collected by Marco Baroni and his colleagues (it's huge; handle this corpus with care),
BASE,
British Academic Spoken English
, collected by Hilary Nesi and colleagues at Coventry University