Relational Operations on Bags

Relational Operations on Bags

While a set of tuples (i.e., a relation) is a simple, natural model of data as it might appear in a database, commercial database systems rarely, if ever, are based entirely on sets. In some situations, relations as they appear in database systems are allowed to have duplicate tuples. Remember that if a "set" is permitted to have multiple occurrences of a member, then that set is called a bag or multiset. In this section, we shall examine relations that are bags rather than sets; that is, we shall allow the same tuple to appear more than once in a relation. When we refer to a "set", we mean a relation without duplicate tuples; a "bag" means a relation that may (or may not) have duplicate tuples.

Example (a) : The relation in Figure (a) is a bag of tuples. In it, the tuple (1,2) appears three times and the tuple (3,4) appears once. If Figure (a) were a set-valued relation, we would have to eliminate two occurrences of the tuple (1,2). In a bag-valued relation, we do allow multiple occurrences of the same tuple, but like sets, the order of tuples does not matter.

A bag

Why Bags?

When we think about implementing relations efficiently, we can see a number of ways that allowing relations to be bags rather than sets can speed up operations on relations. We mentioned at the beginning of "An Algebra of Relational Operations" how allowing the result to be a bag could speed up the union of two relations. For another example, when we do a projection, allowing the resulting relation to be a bag (even when the original relation is a set) lets us work with each tuple separately. If we want a set as the result, we need to compare each projected tuple with all the other projected tuples, to make sure that each projection appears only once. Nevertheless, if we can accept a bag as the result, then we simply project each tuple and add it to the result; no comparison with other projected tuples is required.

Example (b) : The bag of Figure (a) could be the result of projecting the relation shown in Figure (b) onto attributes A and B, on the condition we allow  the result to be a bag and do not eliminate the duplicate occurrences of (1,2).
Bag for Example (b)

Had we used the ordinary projection operator of relational algebra, and therefore eliminated duplicates, the result would be only:

Note that the bag result, although larger, can be calculated more quickly, since there is no need to compare each tuple (1,2) or (3,4) with previously generated tuples.

In addition, if we are projecting a relation in order to take an aggregate (discussed in "Extended Operators of Relational Algebra"), such as "Find the average value of A in Figure (b)", we could not use the set model to think of the relation projected onto attribute A. As a set, the average value of A is 2, because there are only two values of A - 1 and 3 - in Figure (b), and their average is 2. On the other hand, if we treat the A-column in Figure (b) as a bag {1,3,1,1}, we get the  correct average of A, which is 1.5, among the four tuples of Figure (b).