Querying Arabic Corpora

Arabic Internet Al Hayat News Arabic Wikipedia Corpus of Contemporary Arabic Computer Science corpus
Arabic legal texts, v2
   CQP syntax only (Examples)    Click here for getting help on the query interface
Centre for Translation Studies
Centre for Translation Studies

Set parameters of your query

I also created the lists of the most frequent word forms in Internet, LDC, Wikipedia and CCA, as well as in the legal corpus.
After lemmatisation done by Majdi Sawalha there is also the frequency list of lemmas and rootsin the Arabic Internet corpus.

The corpora are:

  1. The Internet corpus was compiled using the procedure described in my paper in the WaCky book.
  2. The Al Hayat corpus — from Al Hayat data (1999-2001) compiled by the LDC.
  3. The Wikipedia corpus — from the public Wiki data retrieved on July 28, 2008.
  4. CCA corpus — from Latifa Al-Sulaiti.
  5. The Arabic Legal Corpus — from keywords collected by Hanem El-Farahaty, a Leeds PhD student.
  6. Computer Science corpus of Arabic — from keywords collected by Latifa Al-Sulaiti

The interface was developed by Serge Sharoff; contact me at s.sharoffleeds.ac.uk, if you have further queries.