Union, Intersection, and Difference of Queries

Union, Intersection, and Difference of Queries

Sometimes we wish to combine relations using the set operations of relational algebra: union, intersection and difference. SQL provides corresponding operators that apply to the results of queries, provided those queries produce relations with the same list of attributes and attribute types. The keywords used are UNION, INTERSECT and EXCEPT for U, ∩, and -, respectively. Words like UNION are used between two queries, and those queries must be parenthesized.

Example 1: Assume we wanted the names and addresses of all female movie stars who are also movie executives with a net worth over $10,000,000. Using the following two relations:

MovieStar(name, address, gender, birthdate)
MovieExec(name, address, cert#, netWorth)

we can write the query as in Figure 1. Lines (1) through (3) produce a relation whose schema is (name, address) and whose tuples are the names and addresses of all female movie stars.

Intersecting female movie stars with rich executives

Likewise, lines (5) through (7) produce the set of "rich" executives, those with net worth over $10,000,000.This query also yields a relation whose schema has the attributes name and address only. Since the two schemas are the same, we can intersect them, and we do so with the operator of line (4).

Example 2 : In a similar vein, we could take the difference of two sets of persons, each selected from a relation. The query

(SELECT name, address FROM MovieStar)
(SELECT name, address FROM MovieExec);

gives the names and addresses of movie stars who are not also movie executives, regardless of gender or net worth.

In the two examples above, the attributes of the relations whose intersection or difference we took were conveniently the same.  On the other hand, if necessary to get a common set of attributes, we can rename attributes as in "Projection in SQL" Example 2.

Example 3 : Assume we wanted all the titles and years of movies that appeared in either the Movie or StarsIn relation of our running example:

Movie(title, year, length, inColor, studioName, producerC#)
StarsIn(movieTitle, movieyear, starName)

Readable SQL Queries

Ideally, these sets of movies would be the same, but in practice it is common for relations to diverge; for instance we might have movies with no listed stars or a StarsIn tuple that mentions a movie not found in the Movie relation. Thus, we might write

(SELECT title, year FROM Movie)
(SELECT movieTitle AS title, movieYear AS year FROM StarsIn);

The result would be all movies mentioned in either relation, with title and year as the attributes of the resulting relation.