[ Collection: Introduction to CQP ]
This section introduces simple queries involving one token. It presupposes that you have read Section 1. As before, we use the British National Corpus (BNC) for the examples.
In CQP, every token is represented by a attribute-value pair of the general form [attribute=“value”]
. The attribute corresponds to one of the columns of the vrt file – e.g., word
, hw
(headword), lemma
, or pos
(part of speech) (remember, different corpora may use different labels for these columns). The value corresponds to the content of these columns – it specifies, what we are searching for (or querying) in the column named by the attribute.
For example, if we are searching for love – the word, not the feeling – the attribute-value pair will look like this:
[word="love"]
Enter this at the prompt, and hit return. The result will be a so-called “KWIC (Key Word In Context) concordance”, with each line showing one example of the string you searched for in the middle and a fixed number of characters of the context in which the string occurs.
To move forward through the concordance, hit the SPACE
key, to move backwards, hit the b
key. To quit the concordance and conduct a new query, hit the q
key.
At the top of a concordance, there is some general information about the concordance – when it was created, what corpus it comes from, how many matches the query returned, and, most importantly, what the exact query was. This is very important in larger research projects, where you may create and save a number of concordances and always need to know exactly what you searched for in order to create a particular concordance (Section 3f will discuss concordances in more detail, including the question how to save and export them).
For now, look at the concordance you have created by the query [word=“love”]
: you will see that the command has returned exact matches of the string love
– some of them are verbs (but only in the uninflected form), some are nouns (but only in the singular), and all of them are exclusively in lower case. In other words, the query returns exactly what we tell it to return – no more, no less.
If we want our query to return all forms of the word love – including loves, loving, and loved, we have to include all these forms in the value part of the token (Section 3a explains how this is done). Unless, of course, our corpus has a h(ead)w(ord)
or lemma
column. The BNC has a hw
column, so we can query this column instead of the word
column:
[hw="love"]
Again, enter this at the prompt and hit return. You will see, that the query now returns all word forms of the lemma (still including both verbs and nouns – Section 3b will explain how to restrict a query for a particular lemma to a particular part of speech).
Speaking of parts of speech – if our corpus contains a pos
column, this can also be queried. The following attribute-value pair will retrieve all words tagged as NN1
, which, in the BNC, stands for “singular noun” (remember, different corpora may use different sets of tags – some widely-used tagset are described here):
[pos="NN1"]
Enter this command and see what happens.
Note that all corpora must have at least one column – the word
column containing the word forms. There is a shorthand way of querying this column: simply type the string you want to retrieve in double quotation marks and hit return – “love”
will return the same result as [word=“love”]
. We will never use this shorthand in this tutorial, but you may see it in other materials.
Sometimes, you may want to exclude a particular value from a query. For example, you may be through with love (as in Destiny Child's famous song), and you may want to search for all words in the corpus that are not one of the word forms of the word love. This is done by adding an exclamation mark before the equals sign in the attribute value pair (in programming languages, the exclamation mark often stands for “not”, so !=
means does not equal
):
[hw!="love"]
Try it. Of course, you can also use this convention with other attributes, such as word
or pos
. It is not usually useful by itself, but it is very useful when you combine attributes, as will be discussed Section 3b.
As noted above, all queries in CQP are case sensitive – if you specify a string in lower case, you will only get results in lower case, and if you are (to quote a song by Fleetwood Mac) “lookin' out for love, big, big love”, you would have to specify the string in upper case. Try the following three queries and note the difference:
[word="love"] [word="Love"] [word="LOVE"]
Typically, you will not care about upper and lower case and you will want CQP to ignore this distinction. In this case, you simply attach the characters %c
(for “case-insensitive”) to the value (following the closing quotation mark:
[word="love"%c]
Enter this query and look at the results. You will see that the results include the string love in all kinds of combinations of upper and lower case.