Differences

This shows you the differences between two versions of the page.

--- cqp:complex-queries [2020/04/21 14:59] – external edit 127.0.0.1
+++ cqp:complex-queries [2024/06/20 13:53] (current) – external edit 127.0.0.1
@@ Line 1: / Line 1: @@
+**[ [[cqp:introduction|Collection: Introduction to CQP]] ]**
+====== 3c. Complex Queries ======
+//This section introduces complex queries, i.e., queries for sequences of tokens rather than just a single token. It presupposes that you have read [[cqp:corpus-structure|Section 1]] and [[cqp:simple-queries|Section 2]].//
+===== Sequences of value-attribute pairs =====
+So far, we have only looked at queries involving a single token -- we looked for //love//, for nouns, etc. Corpus linguistic investigations often start from individual words, so such queries are very typical. However, there are many research questions that involve multi-word expressions, i.e., sequences of tokens, and CQP allows us to construct complex queries for such cases, that consist of a sequence of attribute-value pairs, each enclosed in its own set of square brackets: ''[attribute="value"] [attribute="value"]''.
+For example, we might want to search for //true love// only. The query for this would look as follows (note the ''%c'' to make the query case insensitive):
+	[word="true"%c] [word="love"%c]
+Enter this query at the prompt and hit ''RETURN''. You will see that now your concordance consists only of cases of the sequence //true love//.
+Or, we might feel nostalgic and want to search for //lost love//. The query would look like this:
+	[word="lost"%c] [word="love"%c]
+Of course, we can construct sequences where we are using different attributes at different positions. For example, we might be interested in all the different types of love in the corpus, i.e., all sequences of an adjective and the word //love//. The query would look like this (''AJ0'' is the tag the BNC uses for uninflected adjectives):
+	[pos="AJ0"] [hw="love"]
+Run this query and look at the result. Your concordance will now contain all such sequences, for example, //imaginative love//, //sexual love//, //Greek love//, //modern love// and //free love//. It would be nice if there was a simple way of creating a list of all adjectives preceding the word //love//, and in fact, there is such a way, which will be described in [[cqp:counting|Section 4a]].
+===== Excluding elements from sequences =====
+Recall from [[cqp:simple-queries|Section 2]] that we can also search for tokens that do //not// have a particular property (for example, we searched for all words other than //love// using the query ''[hw!="love"]''. It was pointed out that this possibility is particularly useful in complex queries: Note that line 16 of the last concordance we created contains the match //long love//, where //love// is actually the first element of the compound //love affair//. If we are interested in adjectives modifying the word //love//, such cases would confound the results, so we could specify that the word //love// should not be followed by a noun:
+	[pos="AJ0"] [hw="love"] [pos!="NN1"]
+(Strictly speaking, this only excludes nouns in the singular, you can use the notation with the parentheses and the pipe symbol described in [[cqp:extending-queries-combinations|Section 3a]] to exclude both singular and plural nouns.)
+===== Sequences with gaps =====
+Of course, linguistic structures are often more complex than a simple sequence of tokens -- they may contain optional, variable positions (for example, a noun phrase consists of an optional determiner, optionally followed by one or more adjectives, followed by a noun (or, in the case of compounds, a sequence of nouns). CQP allows us to construct queries that take this optionality into account.
+Let us use a simpler example than that of a noun phrase. Let us say that we are interested in //falling in love// -- the fixed expression, not the event denoted by it. We could construct the following query:
+	[hw="fall"] [word="in"%c] [word="love"%c]
+However, this will only find cases where the three words occur in an uninterrupted sequence (try it). It is possible, that an adverb occurs between //fall// and //in// –– as in the line //I fell so hard in love with you// from the song //Just one look// by the great Doris Troy. We can adjust our query to take this into account by simply inserting an empty pair of square brackets in the appropriate position:
+	[hw="fall"] [] [word="in"%c] [word="love"%c]
+This empty pair of brackets means “any token”. If you run this query, you will find sequences like //fall completely in love//, //fall passionately in love//, //fall hopelessly in love//, and so on.
+However, this does not solve our problem yet, because now the query only finds cases where something occurs between //fall// and //in// -- it does not take into account, that this is //optional//. To do this, we can attach a pair of curly braces to the token, containing two numbers separated by a comma, with the first number specifying the minimal number of times that the element must occur, and the second one specifying the maximal number: ''{min, max}''. For example, if we want to specify that zero or one token may occur between //fall// and //in//, the query would look like this:
+	[hw="fall"] []{0,1} [word="in"%c] [word="love"%c]
+Try the query, you will see that it now includes sequences with and without an adverb (you will have to scroll down a little bit before you see the first case of an adverb).
+In the line from the song I just cited, there are actually two elements between //fall// and //in//. We could take this into account by adjusting the relevant part of the query to ''[]{0,2}'', but what if we want to find //only// those cases where there are two elements between //fall// and //in//? In this case, we simply put a single number between the curly braces -- this then means “exactly this number of times”:
+	[hw="fall"] []{2} [word="in"%c][word="love"%c]
+You will find that there are a few such cases in the BNC.
+===== Summary and outlook =====
+This section has shown you how to create concordances using complex queries. Building on this, you can look at the following sections in any order:
+  * [[cqp:extending-queries-combinations|Section 3a]]: Extending simple queries: Alternative attributes and values
+  * [[cqp:extending-queries-alternatives|Section 3b]]: Extending simple queries: Combinations of attributes and values
+  * [[cqp:metadata|Section 3d]]: Metadata
+  * [[cqp:regular-expressions-basics|Section 3e]]: Regular expressions (basics)
+  * [[cqp:concordances|Section 3f]]: Working with concordances
+  * [[cqp:sorting-sampling|Section 3g]]: Sorting and sampling
+**[ Introduction to CQP: [[cqp:corpus-structure|Section 1]] -- [[cqp:simple-queries|Section 2]] -- [[cqp:advanced-querying|Section 3]] -- [[cqp:beyond-queries|Section 4]] -- [[cqp:expert-tricks|Section 5]] -- [[cqp:exercises|Section 6]] ]**