Relational Algebra

Relational Algebra

This section begins a study of database programming, that is, how the user can ask queries of the database and can change the contents of the database.  Our focus is on the relational model, and especially on a notation for describing queries about the content of relations called "relational algebra".

While ODL uses methods that, in principle, can perform any operation on data, and the E/R model does not embrace a particular way of controlling data, the relational model has a concrete set of "standard" operations on data. Surprisingly, these operations are not "Turing complete" the way ordinary programming languages are. Therefore, there are operations we cannot express in relational algebra that could be expressed, for example, in ODL methods written in C++. This situation is not a defect of the relational model or relational algebra, because the advantage of limiting the scope of operations is that it becomes possible to optimize queries written in a very high level language such as SQL, which we introduce in "The Database Language SQL".

We begin by introducing the operations of relational algebra. This algebra formally applies to sets of tuples, i.e., relations. However, commercial DBMS's use a slightly different model of relations, which are bags, not sets. That is, relations in practice may contain duplicate tuples. While it is often useful to think of relational algebra as a set algebra, we also need to be conscious of the effects of duplicates on the results of the operations in relational algebra. In the final part of this section, we look at the matter of how constraints on relations can be expressed.

Later sections let us see the languages and features that today's commercial DBMS's offer the user. The operations of relational algebra are all implemented by the SQL query language, which we study beginning in "The Database Language SQL". These algebraic operations also appear in the OQL language, an object-oriented query language based on the ODL data model and introduced in "Object-Orientation in Query Languages". 

An Example Database Schema

As we begin our focus on database programming in the relational model, it is useful to have a particular schema on which to base our examples of queries. Our chosen database schema draws upon the running example of movies, stars, and studios, and it uses normalized relations similar to the ones that we developed in "From E/R Diagrams to Relational Designs". However, it includes some attributes that we have not used previously in examples, and it contains one relation - MovieExec - that has not appeared before. The purpose of these changes is to give us some opportunities to study different data types and different ways of representing information. Figure (a) shows the schema.

Example database schema about movies

Our schema has five relations. The attributes of each relation are listed, along with the intended domain for that attribute. The key attributes for a relation are shown in capitals in Figure (a), although when we refer to them in text, they will be lower-case as they have been so far. For example, all three attributes together form the key for relation StarsIn. Relation Movie has six attributes; title and year together constitute the key for Movie, as they have previously. Attribute title is a string, and year is an integer.

The major modifications to the schema compared with what we have seen so far are:

●  There is a notion of a certificate number for movie executives studio presidents and movie producers. This certificate is a unique integer that we imagine is maintained by some external authority, perhaps a registry of executives or a "union".

●  We use certificate numbers as the key for movie executives, although movie stars do not always have certificates and we shall continue to use name as the key for stars. That decision is probably unrealistic, since two stars could have the same name, but we take this road in order to illustrate some different options.

●  We introduced the producer as another property of movies. This information is represented by a new attribute, producerC#, of relation Movie. This attribute is intended to be the certificate number of the producer. Producers are expected to be movie executives, as are studio presidents. There may also be other executives in the MovieExec relation.

●  Attribute filmType of Movie has been changed from an enumerated type to a boolean-valued  attribute called inColor: true if the movie is in color and false if it is in black and white.

●  The attribute gender has been added for movie stars. Its type is "character", either M for male or F for female. Attribute birthdate, of type "date" (a special type supported by many commercial database systems or just a character string if we prefer) has also been added.

●  All addresses have been made strings, rather than pairs consisting of a street and city. The purpose is to make addresses in different relations comparable easily and to simplify operations on addresses.