Natural Joins / Theta-Joins

Natural Joins / Theta-Joins

Natural Joins

More frequently than we want to take the product of two relations, we find a need to join them by pairing only those tuples that match in some way. The simplest kind of match is the natural join of two relations R and S, denoted R ⋈ S, in which we pair only those tuples from R and S that agree in whatever attributes are common to the schemas of R and S. More specifically, let A1, A2, . . . , An, be all the attributes that are in both the schema of R and the schema of S. Then a tuple r from R and a tuple s from S are successfully paired if and only if r and s agree on each of the attributes A1, A2, . . . , An..

If the tuples r and s are successfully paired in the join R ⋈ S, then the result of the pairing is a tuple, called the joined tuple, with one component for each of the attributes in the union of the schemas of R and S. The joined tuple agrees with tuple r in each attribute in the schema of R, and it agrees with s in each attribute in the schema of S. Since r and s are successfully paired, the joined tuple is able to agree with both these tuples on the attributes they have in common. The construction of the joined tuple is suggested by Figure (a).

Joining tuples

Note also that this join operation is the same one that we used in "Recovering Information from a Decomposition" to recombine relations that had been projected onto two subsets of their attributes. There the motivation was to explain why BCNF decomposition made sense. In "Combining Operations to Form Queries" we shall see another use for the natural join: combining two relations so that we can write a query that relates attributes of each.

Example (a) : The natural join of the relations R and S from "Selection / Cartesian Product" Figure (a) is


The only attribute common to R and S is B. Thus, to pair successfully, tuples need only to agree in their B components. If so, the resulting tuple has components for attributes A (from R), B (from either R or S), C (from S), and D (from S).

In this instance, the first tuple of R successfully pairs with only the first tuple of S; they share the value 2 on their common attribute B. This pairing yields the first tuple of the result: (1,2,5,6). The second tuple of R pairs successfully only with the second tuple of S, and the pairing yields (3,4,7,8). Note that the third tuple of S does not pair with any tuple of R and thus has no effect on the result of R ⋈ S. A tuple that fails to pair with any tuple of the other relation in a join is said to be a dangling tuple.

Example (b) : The previous example does not show all the possibilities inherent in the natural join operator. For instance, no tuple paired successfully with more than one tuple, and there was only one attribute in common to the two relation schemas. In Figure (b) we see two other relations, U and V, that share two attributes between their schemas: B and C. We also show an instance in which one tuple joins with several tuples.

For tuples to pair successfully, they must agree in both the B and C components. Thus, the first tuple of U joins with the first two tuples of V, while the second and third tuples of U join with the third tuple of V. The result of these four pairings is shown in Figure (b).

Theta-Joins

The natural join forces us to pair tuples using one particular condition. While this way, equating shared attributes, is the most common basis on which relations are joined, it is sometimes desirable to pair tuples from two relations on some other basis. For that reason, we have a related notation called the theta-join. Historically, the "theta" refers to an arbitrary condition, which we shall represent by C rather than 0.

The notation for a theta-join of relations R and S based on condition C is . The result of this operation is constructed as follows:

1. Take the product of R and S.
2. Select from the product only those tuples that satisfy the condition C.

As with the product operation, the schema for the result is the union of the schemas of R and S, with "R" or "S" prefixed to attributes if necessary to indicate from which schema the attribute came.

Natural join of relations

Example (c) : Look at the operation , where U and V are the relations from Figure (b). We must take into account all nine pairs of tuples, one from each relation, and see whether the A component from the U-tuple is less than the D component of the V-tuple. The first tuple of U, with an A component of 1, successfully pairs with each of the tuples from V. On the other hand, the second and third tuples from U, with A components of 6 and 9, respectively, pair successfully with only the last tuple of V. In this way, the result has only five tuples, constructed from the five successful pairings. This relation is shown in Figure (c).

Notice that the schema for the result in Figure (c) consists of all six attributes with U and V prefixed to their respective occurrences of attributes B and C to distinguish them. Therefore, the theta-join contrasts with natural join, since in the latter common attributes are merged into one copy. Certainly it makes sense to do so in the case of the natural join, since tuples don't pair unless they agree in their common attributes. In the case of a theta-join, there is no guarantee that compared attributes will agree in the result, since they may not be compared with =.



Example (d) : Here is a theta-join on the same relations U and V that has a more complicated condition:


That is, we require for successful pairing not only that the A component of the U-tuple be less than the D component of the V-tuple, but that the two tuples disagree on their respective B components. The tuple


is the only one to satisfy both conditions, so this relation is the result of the theta-join above.


Tags