Correlated Subqueries

Correlated Subqueries

The simplest subqueries can be evaluated once and for all, and the result used in a higher-level query. A more complicated use of nested subqueries requires the subquery to be evaluated many times: once for each assignment of a value to some term in the subquery that comes from a tuple variable outside the subquery. A subqucry of this type is called a correlated subquery. Let us begin our study with an example.

Example 1 : We shall find the titles that have been used for two or more movies. We start with an outer query that looks at all tuples in the relation

Movie(title, year, length, inColor, studioName, producer#)

For each such tuple, we ask in a subquery whether there is a movie with the same title and a greater year. The entire query is shown in Figure 1.

As with other nested queries, let us begin at the innermost subquery, lines (4) through (6). If Old.title in line (6) were replaced by a constant string such as King Kong , we would understand it quite easily as a query asking for the year or years in which  movies titled King Kong were made. The present subquery differs little. The only problem is that we don't know what value Old.title has. On the other hand, as we range over Movie tuples of the outer query of lines (1) through (3), each tuple provides a value of Old.title. We then execute the query of lines (4) through (6) with this value for Old.title to decide the truth of the WHERE clause that extends from lines (3) through (6).

Finding movie titles that appear more than once

The condition of line (3) is true if any movie with the same title as Old.title has a later year than the movie in the tuple that is the current value of tuple variable Old. This condition is true unless the year in the tuple Old is the last year in which a movie of that title was made. As a result, lines (1) through (3) produce a title one fewer times than there are movies with that title. A movie made twice will be listed once, a movie made three times will be listed twice, and so on.

When writing a correlated query it is important that we be aware of the scoping rules for names. Generally, an attribute in a subquery belongs to one of the tuple variables in that subquery's FROM clause if some tuple variable's relation has that attribute in its schema. If not, we look at the immediately surrounding subquery, then to the one surrounding that, and so on. In this way, year on line (4) and title on line (6) of Figure 1 refer to the attributes of the tuple variable that ranges over all the tuples of the copy of relation Movie introduced on line (5) - that is, the copy of the Movie relation addressed by the subquery of lines (4) through (6).

Though, we can arrange for an attribute to belong to another tuple variable if we prefix it by that tuple variable and a dot. That is why we introduced the alias Old for the Movie relation of the outer query, and why we refer to Old.title in line (6). Note that if the two relations in the FROM clauses of lines (2) and (5) were different, we would not need an alias. Rather, in the subquery we could refer directly to attributes of a relation mentioned in line (2).