| corpus title | size | time | source | language |
|---|---|---|---|---|
| British National Corpus (BNC) | 100 million tokens | mid 1970s - early 1990s | Oxford | British English |
| The Brown Corpus | 1 mio tokens | 1961 | ICAME | British English |
| The Lancaster/Oslo-Bergen Corpus (LOB) | 1 mio. tokens | 1961 | ICAME | British English |
| International Corpus of English (ICE) | xxxxxx | varieties of world Englishes | International Corpus of English (ICE) at Zuerich, CH | world English |
| Mark Davies' English Corpora | xxxxxx | diverse set of corpora | Mark Davies | American English, British English, international English |
| Textcorpora in the DWDS | div. | div. | https://www.dwds.de/r | German |
| DWDS Kernkorpus | 1900-1999 | Berlin-Brandenburgische Akademie der Wissenschaften: https://www.dwds.de/d/korpora/kern | German | |
| DWDS Kernkorpus 21 | 2000-2010 | Berlin-Brandenburgische Akademie der Wissenschaften: https://www.dwds.de/d/korpora/korpus21 | German | |
| Hamburg Dependency Treebank | German news site heise.de, articles published between 1996 and 2001 | http://hdl.handle.net/11022/0000-0000-7FC7-2 | German | |
| IDS-Corpora | http://www.ids-mannheim.de/kt/corpora.html | German | ||
| LIMAS-Korpus | 1 mio words, 500 texts / fragments | 1970s | http://www.korpora.org/Limas/ | German |
| Arabic News Texts Corpus (AntCorpus) | https://antcorpus.github.io/ | Arabic | ||
| Wortschatz Leipzig | various sample sizes | Arabic, English, French, German, Russian misc. | https://wortschatz.uni-leipzig.de/de/download | various |
| SpråkbankenText | https://spraakbanken.gu.se/en/resources | Swedish |
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 4.0 International