User Tools

Site Tools


cqp:list-of-coprora

[ Collection: Introduction to CQP ]

7. Available Corpora

Once you've set up your access to CQP on the university server (that is the INLET corpus system), you'll have a selection of different corpora at your disposal. This list will introduce you to some of them that might be interesting to you. If you don't have access to CQP just yet, check out the INLET site to install the system on your account.

For more information on the INLET system, visit this site.

For more detailed information on each of these corpora, select the corpus on CQP, type info and press ENTER.

BNC

BRITISH NATIONAL CORPUS

Size: 112,156,361 tokens

Text publication dates: 1960-1993 (split up into 3 periods)

Tagset: CLAWS-5

Cite as: BNC Consortium. 2007. The British National Corpus, version 3 (BNC XML Edition). Oxford: Bodleian Libraries, University of Oxford. URL: http://www.natcorp.ox.ac.uk/

Corpus documentation: http://www.natcorp.ox.ac.uk/

BNC-BABY

BRITISH NATIONAL CORPUS (a smaller version)

Size: 4,644,834 tokens

Tagset: CLAWS-5

Corpus documentation: http://www.natcorp.ox.ac.uk/corpus/baby/manual.pdf

CLMET

CORPUS OF LATE MODERN ENGLISH TEXTS

Size: 40,340,760 tokens

Text publication dates: 1710-1920 (split up into 3 periods)

Tagset: PENN Corpora

Corpus documentation: https://perswww.kuleuven.be/~u0044428/clmet3_0.htm

Cite as: De Smet, Hendrik, Susanne Flach, Jukka Tyrkkö & Hans-Jügen Diller. 2015. The Corpus of Late Modern English (CLMET), version 3.1: Improved tokenization and linguistic annotation. KU Leuven, FU Berlin, U Tampere, RU Bochum.

BROWN-LEGACY

The Standard Corpus of Present-Day Edited American English

Size: 1,137,466 tokens (approx. 1m words)

Text publication dates: 1961

Corpus documentation: https://varieng.helsinki.fi/CoRD/corpora/BROWN/index.html

Cite as: A Standard Corpus of Present-Day Edited American English, for use with Digital Computers (Brown). 1964, 1971, 1979. Compiled by W. N. Francis and H. Kučera. Brown University. Providence, Rhode Island.

FROWN-LEGACY

The Freiburg-Brown corpus of American English

Size: 1,180,152 (approx. 1m words)

Text publication dates: 1992

Corpus documentation: https://varieng.helsinki.fi/CoRD/corpora/FROWN/index.html

Cite as: The Freiburg-Brown Corpus (‘Frown’) (POS-tagged version) compiled by Christian Mair, Albert Ludwigs-Universität Freiburg, and Geoffrey Leech, University of Lancaster

LOB-LEGACY

The Lancaster-Oslo/Bergen Corpus

Size: 1,157,496 tokens (approx. 1m words)

Text publication dates: 1961

Corpus documentation: https://varieng.helsinki.fi/CoRD/corpora/LOB/index.html

Cite as: The LOB Corpus, POS-tagged version (1981–1986), compiled by Geoffrey Leech, Lancaster University, Stig Johansson, University of Oslo (project leaders), Roger Garside, Lancaster University, and Knut Hofland, University of Bergen (heads of computing).

FLOB-LEGACY

The Freiburg–LOB Corpus of British English

Size: 1,165,747 tokens (approx. 1m words)

Text publication dates: 1991

Corpus documentation: https://varieng.helsinki.fi/CoRD/corpora/FLOB/index.html

Cite as: The Freiburg-LOB Corpus (‘F-LOB’) (POS-tagged version) compiled by Christian Mair, Albert Ludwigs-Universität Freiburg, and Geoffrey Leech, University of Lancaster

ICLE

INTERNATIONAL CORPUS OF LEARNER ENGLISH

Size: 2,518,276 tokens

Author's first languages: Bulgarian, Czech, Dutch (Netherlands), Dutch (Belgium), French, German, Italian, Polish, Russian, etc.

Corpus documentation: https://uclouvain.be/en/research-institutes/ilc/cecl/icle.html

Cite as: Granger, Sylviane, Estelle Dagneaux & Fanny Meunier. 2002. International Corpus of Learner English (ICLE). Louvain: Presses Universitaires de Louvain.

COCA-S

CORPUS OF CONTEMPORARY AMERICAN ENGLISH (COCA)

Size: 542,341,719 tokens (440m words)

Text publication dates: 1990-2012

Tagset: CLAWS-7

Corpus documentation: http://corpus.byu.edu/coca

Cite as: Davies, Mark. 2008. The Corpus of Contemporary American English: 450 Million Words, 1990-2012. http://corpus.byu.edu/coca.

COHA-S

CORPUS OF HISTORICAL AMERICAN ENGLISH (COHA)

Size: 471,427,380 tokens (400m words)

Tagset: CLAWS-7

Corpus documentation: http://corpus.byu.edu/coha/

Cite as: Davies, Mark. 2010. The Corpus of Historical American English: 400 million words, 1810-2009. http://corpus.byu.edu/coha/.

PPCME2

PENN-HELSINKI PARSED CORPUS OF MIDDLE ENGLISH (Version 2)

Size: 1,354,926 tokens

Text publication dates: 1150-1500 (split up into 9 periods)

Tagset: PENN Corpora

Corpus documentation: http://www.ling.upenn.edu/hist-corpora/PPCME2-RELEASE-3/

Cite as: Anthony Kroch and Ann Taylor. 2000. The Penn-Helsinki Parsed Corpus of Middle English (PPCME2). Department of Linguistics, University of Pennsylvania. CD-ROM, second edition, (http://www.ling.upenn.edu/hist-corpora/).

PPCEME

PENN-HELSINKI PARSED CORPUS OF EARLY MODERN ENGLISH

Size: 1,968,483 tokens

Text publication dates: 1500-1710

Tagset: PENN Corpora

Corpus documentation: http://www.ling.upenn.edu/hist-corpora/PPCEME-RELEASE-2/

Cite as: Anthony Kroch, Beatrice Santorini, and Lauren Delfs. 2004. The Penn-Helsinki Parsed Corpus of Early Modern English (PPCEME). Department of Linguistics, University of Pennsylvania. CD-ROM, first edition, (http://www.ling.upenn.edu/hist-corpora/).

PPCMBE

PENN-HELSINKI PARSED CORPUS OF MODERN BRITISH ENGLISH

Size: 1,095,044 tokens

Text publication dates: 1700-1914

Tagset: PENN Corpora

Corpus documentation: http://www.ling.upenn.edu/hist-corpora/PPCMBE-RELEASE-1/

Cite as: Anthony Kroch, Beatrice Santorini, and Lauren Delfs. 2010. The Penn-Helsinki Parsed Corpus of Modern British English (PPCMBE). Department of Linguistics, University of Pennsylvania. CD-ROM, first edition. (http://www.ling.upenn.edu/hist-corpora/).

PPCEEC

PENN-HELSINKI PARSED CORPUS OF EARLY ENGLISH CORRESPONDENCE

Size: 2,371,920 tokens

Text publication dates: 1350-1710 (split up into 5 periods)

Tagset: PENN Corpora

Corpus documentation: http://www-users.york.ac.uk/~lang22/PCEEC-manual/corpus_description/index.htm

Cite as: Parsed Corpus of Early English Correspondence, tagged version. 2006. Annotated by Arja Nurmi, Ann Taylor, Anthony Warner, Susan Pintzuk, and Terttu Nevalainen. Compiled by the CEEC Project Team. York: University of York and Helsinki: University of Helsinki. Distributed through the Oxford Text Archive.

[ Introduction to CQP: Section 1Section 2Section 3Section 4Section 5Section 6Section 7 ]

cqp/list-of-coprora.txt · Last modified: 2024/10/29 17:52 by aamoakuh

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki