*Full-Relation Operations*

In this section we shall look at some operations that act on relations as a whole, rather than on tuples individually or in small numbers (as do joins of various relations, for example). First, we deal with the fact that SQL uses relations that are bags rather than sets, and a tuple can appear more than once in a relation. We shall see how to force the result of an operation to be a set in "Eliminating Duplicates", and in "Duplicates in Unions, Intersections, and Differences" we shall see that it is also possible to prevent the elimination of duplicates in circumstances where SQL systems would usually eliminate them.

Then, we discuss how SQL supports the grouping and aggregation operator that we introduced in "Grouping". SQL has aggregation operators and a GROUP-BY clause. There is also a "HAVING" clause that allows selection of certain groups in a way that depends on the group as a whole, rather than on individual tuples.

### Eliminating Duplicates

As mentioned in "Correlated Subqueries", SQL's notion of relations differs from the abstract notion of relations presented in "The Relational Data Model". A relation, being a set, cannot have more than one copy of any given tuple. When an SQL query creates a new relation, the SQL system does not ordinarily eliminate duplicates. Thus, the SQL response to a query may list the same tuple many times.Recall from "Interpreting Multirelation Queries" that one of many equivalent definitions of the meaning of an SQL select-from-where query is that we begin with the Cartesian product of the relations referred to in the FROM clause. Each tuple of the product is tested by the condition in the WHERE clause, and the ones that pass the test are given to the output for projection according to the SELECT clause. This projection may cause the same tuple to result from different tuples of the product, and if so, each copy of the resulting tuple is printed in its turn. In addition, since there is nothing wrong with an SQL relation having duplicates, the relations from which the Cartesian product is formed may have duplicates, and each identical copy is paired with the tuples from the other relations, yielding a proliferation of duplicates in the product.

If we do not wish duplicates in the result, then we may follow the keyword SELECT by the keyword DISTINCT. That word tells SQL to produce only one copy of any tuple and is the SQL analog of applying the operator of "Extended Operators of Relational Algebra" to the result of the query.

**Example 1 :**Let us reconsider the query of "Conditions Involving Tuples" Figure 3, where we asked for the producers of Harrison Ford's movies using no subqueries. As written, George Lucas will appear several times in the output. If we want only to see each producer once, we may change line (1) of the query to

**1) SELECT DISTINCT name**

Then, the list of producers will have duplicate occurrences of names eliminated before printing.

Incidentally, the query of "Conditions Involving Tuples" Figure 1, where we used subqueries, does not necessarily suffer from the problem of duplicate answers. True, the subquery at line (4) of "Conditions Involving Tuples" Figure 1 will produce the certificate number of George Lucas several times. However, in the "main" query of line (1), we look at each tuple of MovieExec once. Most probably, there is only one tuple for George Lucas in that relation, and if so, it is only this tuple that satisfies the WHERE clause of line (3). In this way, George Lucas is printed only once.

### Duplicates in Unions, Intersections, and Differences

Unlike the SELECT statement, which preserves duplicates as a default and only eliminates them when instructed to by the DISTINCT keyword, the union. inter-section, and difference operations, which we introduced in "Union, Intersection, and Difference of Queries", usually eliminate duplicates. That is, bags are converted to sets, and the set version of the operation is applied. In order to prevent the elimination of duplicates, we must follow the operator UNION, INTERSECT, or EXCEPT by the keyword ALL. If we do, then we get the bag semantics of these operators as was discussed in "Union, Intersection, and Difference of Bags".**Example 2 :**Consider again the union expression from "Union, Intersection, and Difference of Queries" Example 3, but now add the keyword ALL, as:

**(SELECT title, year FROM Movie)**

UNION ALL

(SELECT movieTitle AS title, movieYear AS year FROM StarsIn);

UNION ALL

(SELECT movieTitle AS title, movieYear AS year FROM StarsIn);

Now, a title and year will appear as many times in the result as it appears in each of the relations Movie and StarsIn put together. For example, if a movie appeared once in the Movie relation and there were three stars for that movie listed in StarsIn (so the movie appeared in three different tuples of StarsIn), then that movie's title and year would appear four times in the result of the union.

As for union, the operators INTERSECT ALL and EXCEPT ALL are intersection and difference of bags. Thus, if R and S are relations, then the result of expression.

**R INTERSECT ALL S**

is the relation in which the number of times a tuple t appears is the minimum of the number of times it appears in R and the number of times it appears in S.

The result of expression

**R EXCEPT ALL S**

has tuple t as many times as the difference of the number of times it appears in R minus the number of times it appears in S, provided the difference is positive. Each of these definitions is what we discussed for bags in "Union, Intersection, and Difference of Bags".

### Grouping and Aggregation in SQL

In "Grouping", we introduced the grouping-and-aggregation operator y for our extended relational algebra. Remember that this operator allows us to partition the tuples of a relation into "groups", based on the values of tuples in one or more attributes, as discussed in "Grouping". We are then able to aggregate certain other columns of the relation by applying "aggregation" operators to those columns. If there are groups, then the aggregation is done separately for each group. SQL provides all the capability of the operator through the use of aggregation operators in SELECT clauses and a special GROUP BY clause.### Aggregation Operators

SQL uses the five aggregation operators SUM, AVG, MIN, MAX and COUNT that we met in "Extended Operators of Relational Algebra". These operators are used by applying them to a scalar-valued expression, usually a column name, in a SELECT clause. One exception is the expression COUNT(*), which counts all the tuples in the relation that is constructed from the FROM clause and WHERE clause of the query.Further, we have the option of eliminating duplicates from the column before applying the aggregation operator by using the keyword DISTINCT. That is, an expression such as COUNT(DISTINCT x) counts the number of distinct values in column x. We could use any of the other operators in place of COUNT here, but expressions such as SUM(DISTINCT x) rarely make sense, since it asks us to sum the different values in column x.

**Example 3 :**The following query finds the average net worth of all movie executives:

**SELECT AVG(netWorth)**

FROM MovieExec;

FROM MovieExec;

Note that there is no WHERE clause at all, so the keyword WHERE is properly omitted. This query examines the netWorth column of the relation

**MovieExec(name, address, cert#, netWorth)**

sums the values found there, one value for each tuple (even if the tuple is a duplicate of some other tuple), and divides the sum by the number of tuples. If there are no duplicate tuples, then this query gives the average net worth as we expect. If there were duplicate tuples, then a movie executive whose tuple appeared n times would have his or her net worth counted n times in the average.

**Example 4 :**The following query:

**SELECT COUNT (*)**

FROM StarsIn;

FROM StarsIn;

counts the number of tuples in the StarsIn relation. The similar query:

**SELECT COUNT(starName)**

FROM StarsIn;

FROM StarsIn;

counts the number of values in the starName column of the relation. Since, duplicate values are not eliminated when we project onto the starName column in SQL, this count should be the same as the count produced by the query with COUNT (*).

If we want to be certain that we do not count duplicate values more than once, we can use the keyword DISTINCT before the aggregated attribute, as:

**SELECT COUNT(DISTINCT starName)**

FROM StarsIn;

FROM StarsIn;

Now each star is counted once, no matter in how many movies they appeared.

### Tags

- tuples
- relational algebra
- attributes
- aggregation operators
- eliminating duplicates
- Scrolling Cursors
- Protecting Against Concurrent Updates
- Modifications by Cursor
- Cursors
- Instead-Of Triggers
- Triggers in SQL
- Schema-Level Constraints and Triggers
- Tuple-Based CHECK Constraints
- Constraints on Attributes and Tuples
- Declaring Foreign-Key Constraints
- Keys Declared With UNIQUE
- Constraints and Triggers
- Interpreting Queries Involving Views
- Modifying Views
- View Definitions
- Introduction to Selection of Indexes
- Default Values / Indexes
- Simple Table Declarations
- Defining a Relation Schema in SQL
- Deletion / Updates
- Database Modifications
- Grouping / HAVING Clauses
- Natural Joins / Outerjoins
- Subqueries in FROM Clauses
- Conditions Involving Tuples
- Subqueries
- Union, Intersection, and Difference of Queries
- Interpreting Multirelation Queries
- Tuple Variables
- Disambiguating Attributes
- Queries Involving More Than One Relation
- Null Values and Comparisons Involving NULL
- Selection in SQL
- Projection in SQL
- The Database Language SQL
- Additional Constraint Examples
- Referential Integrity Constraints
- Constraints on Relations
- Extending the Projection Operator
- Grouping
- Extended Operators of Relational Algebra
- Selection on Bags / Product of Bags / Joins of Bags
- Union, Intersection, and Difference of Bags
- Relational Operations on Bags
- A Linear Notation for Algebraic Expressions
- Dependent and Independent Operations
- Renaming
- Combining Operations to Form Queries
- Selection / Cartesian Product
- Set Operations on Relations
- An Algebra of Relational Operations
- Relational Algebra
- Attribute Lists
- Information Integration Via Semistructured Data
- Semistructured Data Representation
- Object-Oriented Versus Object-Relational
- Nested Relations
- The Object-Relational Model
- What If There Is No Key
- Representing ODL Relationships
- Representing Other Type Constructors
- Representing Set-Valued Attributes
- Nonatomic Attributes in Classes
- Declaring Keys in ODL
- Subclasses in ODL / Multiple Inheritance in ODL
- Types in ODL
- Methods in ODL
- Multiplicity of Relationships
- Relationships in ODL / Inverse Relationships
- Attributes in ODL
- Introduction to ODL
- The Type System
- Decomposition into Fourth Normal Form
- Reasoning About Multivalued Dependencies
- Definition of Multivalued Dependencies
- Multivalued Dependencies
- Third Normal Form
- Boyce-Codd Normal Form
- Decomposing Relations
- Projecting Functional Dependencies
- Closing Sets of Functional Dependencies
- The Transitive Rule
- Why the Closure Algorithm Works
- Computing the Closure of Attributes
- Trivial Functional Dependencies
- The Splitting/Combining Rule
- Rules About Functional Dependencies
- Keys of Relations
- Using Null Values to Combine Relations - Comparison of Approaches
- An Object-Oriented Approach
- Converting Subclass Structures to Relations
- Handling Weak Entity Sets
- Combining Relations
- From E/R Relationships to Relations
- From Entity Sets to Relations
- From E/R Diagrams to Relational Designs
- Relation Instances
- Equivalent Representations of a Relation
- Tuples / Domains
- Attributes / Schemas
- The Relational Data Model
- Summary of The Entity-Relationship Data Model
- Weak Entity Set Notation
- Requirements for Weak Entity Sets
- Representing Keys in the E/R Model
- Keys in the E/R Model
- The Modeling of Constraints
- Picking the Right Kind of Element
- Design Principles
- Subclasses in the E/R Model
- Converting Multiway Relationships to Binary
- Attributes on Relationships
- Multiway Relationships
- Elements of the E/R Model
- Database System Implementation
- Database Programming
- Database Design
- The Query Processor
- Multimedia Data
- Relational Database Systems