Interpreting Multirelation Queries

Interpreting Multirelation Queries

There are many ways to define the meaning of the select-from-where expressions that we have just studied. All are equivalent, in the sense that they each give the same answer for each query applied to the same relation instances. We shall examine each in turn.

Nested Loops

The semantics that we have implicitly used in examples so far is that of tuple variables. Remember that a tuple variable ranges over all tuples of the corresponding relation. A relation name that is not aliased is also a tuple variable ranging over the relation itself, as we mentioned in the box on "Tuple Variables and Relation Names". If there are many tuple variables, we may imagine nested loops, one for each tuple variable, in which the variables each range over the tuples of their respective relations. For each assignment of tuples to the tuple variables, we decide whether the WHERE clause is true. If so, we produce a tuple consisting of the values of the expressions following SELECT; note that each term is given a value by the current assignment of tuples to tuple variables. This query-answering algorithm is suggested by Figure 1.

Parallel Assignment

There is an equivalent definition in which we do not explicitly create nested loops ranging over the tuple variables. Rather, we consider in arbitrary order or in parallel, all possible assignments of tuples from the appropriate relations to the tuple variables. For each such assignment, we consider whether the WHERE clause becomes true. Each assignment that produces a true WHERE clause contributes a tuple to the answer; that tuple is constructed from the attributes of the SELECT clause, evaluated according to that assignment.

Answering a simple SQL query

Conversion to Relational Algebra

A third approach is to relate the SQL query to relational algebra. We start with the tuple variables in the FROM clause and take the Cartesian product of their relations. If two tuple variables refer to the same relation, then this relation appears twice in the product, and we rename its attributes so all attributes have unique names. Likewise, attributes of the same name from different relations are renamed to avoid uncertainty.

Having created the product, we apply a selection operator to it by converting the WHERE clause to a selection condition in the obvious way. That is, each attribute reference in the WHERE clause is replaced by the attribute of the product to which it corresponds. Eventually, we create from the SELECT clause a list of expressions for a final (extended) projection operation. As we did for the WHERE clause, we interpret each attribute reference in the SELECT clause as the corresponding attribute in the product of relations.

Example 1 : Let us convert the query of "Tuple Variables" Example 1 to relational algebra. First, there are two tuple variables in the FROM clause, both referring to relation MovieStar. In this way, our expression (without the necessary renaming) begins:

MovieStar x MovieStar

The resulting relation has eight attributes, the first four correspond to attributes name, address, gender, and birthdate from the first copy of relation MovieStar, and the second four correspond to the same attributes from the other copy of MovieStar. We could create names for these attributes with a dot and the aliasing tuple variable e.g., Star1.gender - but for succinctness, let us invent new symbols and call the attributes simply A1, A2, . . . , A8. Thus, A1 corresponds to, A5 corresponds to, and so on.

An Unintuitive Consequence of SQL Semantics

Under this naming strategy for attributes, the selection condition obtained from the WHERE clause is A2 = A6 and A1 < A5. The projection list is A1, A5. Thus,

renders the entire query in relational algebra.