Extending the Projection Operator

Extending the Projection Operator

Well now reexamine the projection operator πL(R) introduced in "Set Operations on Relations" under "projection". In the classical relational algebra, L is a list of (some of the) attributes of R. We extend the projection operator to allow it to compute with components of tuples as well as choose components. In extended projection, also denoted πL(R), projection lists can have the following types of elements:

1. A single attribute of R.

2. An expression x → y, where x and y are names for attributes. The element x → y in the list L asks that we take the attribute x of R and rename it  y; i.e., the name of this attribute in the schema of the result relation is y.

3. An expression E → z, where E is an expression involving attributes of R, constants, arithmetic operators, and string operators, and z is a new name for the attribute that results from the calculation implied by E. For instance, a + b → x as a list element represents the sum of the attributes a and b, renamed x. Element c| |d → e means concatenate the (presumably string-valued) attributes c and d and call the result e.

The result of the projection is computed by considering each tuple of R in turn. We evaluate the list L by substituting the tuple's components for the corresponding attributes mentioned in L and applying any operators indicated by L to these values. The result is a relation whose schema is the names of the attributes on list L, with whatever renaming the list specifies. Each tuple of R yields one tuple of the result. Duplicate tuples in R surely yield duplicate tuples in the result, but the result can have duplicates even if R does not.

Example 1 : Let R be the relation

Then the result of  is

The result's schema has two attributes. One is A, the first attribute of R, not renamed. The second is the sum of the second and third attributes of R, with the name X.

For another instance,  is

Notice that the calculation required by this projection list happens to turn different tuples (0,1,2) and (3,4,5) into the same tuple (1,1). In this way, the latter tuple appears three times in the result.

The Sorting Operator

There are several contexts in which we want to sort the tuples of a relation by one or more of its attributes. Often, when querying data, one wants the result relation to be sorted. For example, in a query about all the movies in which Sean Connery appeared, we might wish to have the list sorted by title, so we could more easily find whether a certain movie was on the list. We shall also see in "Two-Pass Algorithms Based on Sorting" how execution of queries by the DBMS is sometimes made more efficient if we sort the relations first.

The expression τL(R), where R is a relation and L a list of some of R's attributes, is the relation R, but with the tuples of R sorted in the order indicated by L. If L is the list  A1, A2, . . ., An, then the tuples of R are sorted first by their value of attribute A1. Ties are broken according to the value of A2; tuples that agree on both A1 arid A2 are ordered according to their value of A3, and so on. Ties that remain after attribute An is considered may be ordered randomly.

Example 2 : If R is a relation with schema R(A,B,C), then τC,B(R) orders the tuples of R by their value of C, and tuples with the same C-    value are ordered by their B value. Tuples that agree on both B and C may be ordered randomly.

The operator τ is irregular, in that it is the only operator in our relational algebra whose result is a list of tuples, rather than a set. Therefore, in terms of expressing queries, it only makes sense to talk about τ as the final operator in an algebraic expression. If another operator of relational algebra is applied after τ, the result of the τ is treated as a set or bag, and no ordering of the tuples is implied.