Decomposition into BCNF

Decomposition into BCNF

By frequently selecting appropriate decompositions, we can break any relation schema into a collection of subsets of its attributes with the following important properties:

1.  These subsets are the schemas of relations in BCNF.

2.  The data in the original relation is represented faithfully by the data in the relations that are the result of the decomposition, in a sense to be made exact in "Recovering Information from a Decomposition". Approximately, we need to be able to reconstruct the original relation instance precisely from the decomposed relation instances.

Example (c) of "Boyce-Codd Normal Form" suggests that maybe all we have to do is break a relation schema into two-attribute subsets, and the result is definitely in BCNF. However such an arbitrary decomposition will not satisfy condition (2), as we shall see in "Recovering Information from a Decomposition". Actually, we must be more careful and use the violating FD's to guide our decomposition.

The decomposition strategy we shall follow is to look for a nontrivial FD A1A2 . . . An → B1B2 . . . Bm that violates BCNF; i.e., {A1, A2, . . . , An} is not a superkey. As a heuristic, we shall usually add to the right side as many attributes as are functionally determined by {A1,A2, . . .  ,An}. Figure (a) demonstrates how the attributes are broken into two overlapping relation schemas. One is all the attributes involved in the violating FD, and the other is the left side of the FD plus all the attributes not involved in the FD, i.e., all the attributes except those B's that are not A's.

Relation schema decomposition based on a BCNF violation

Example (a) :  Consider our running example, the Movies relation of Figure 1 of "Design of Relational Database Schemas / Anomalies". We saw in Example (a) of "Boyce-Codd Normal Form" that

    title year  length filmType studioName

is a BCNF violation. In this case, the right side already contains all the attributes functionally determined by title and year, so we shall use this BCNF violation to decompose Movies into:

1. The schema with all the attributes of the FD, that is:

    {title, year, length, filmType, studioName}

2. The schema with all attributes of Movies except the three that appear on the right of the FD. Therefore, we remove length, filmType, and studioName, leaving the second schema:

    {title, year, starName}

Notice that these schemas are the ones selected for relations Movies1 and Movies2 in example of "Decomposing Relations". We examined that these are each in BCNF in Example (b) of "Boyce-Codd Normal Form".

The relation MovieStudio

Example (b) :  Let us think about the relation that was introduced in "The Transitive Rule" example. This relation, which we shall call MovieStudio, stores information about movies, their owning studios, and the addresses of those studios. The schema and some typical tuples for this relation are shown in Figure (b) above.

Note that MovieStudio includes redundant information. Because we added to our customary sample data a second movie owned by Paramount, the address of Paramount is stated twice. However, the source of this problem is not the same as in Example (a). In the latter example, the problem was that a many-many relationship (the stars of a given movie) was being stored with other information about the movie. Here, everything is single-valued: the attributes length and filmType for a movie, the relationship Owns that relates a movie to its unique owning studio, and the attribute studioAddr for studios.

In this case, the problem is that there is a "transitive dependency". That is, as mentioned in "The Transitive Rule" example, relation MovieStudio has the FD's:

    title year studioName
    studioName studioAddr

We may apply the transitive rule to these to get a new FD:

    title year studioAddr

That is, a title and year (i.e., the key for movies) functionally determine a studio address - the address of the studio that owns the movie. Since

    title year length filmType

is another clear functional dependency, we conclude that {title, year} is a key for MovieStudio: in reality it is the only key.

On the other hand, FD:

    studioNarne studioAddr

which is one of those used in the application of the transitive rule above, is non-trivial but its left side is not a superkey. This study tells us MovieStudio is not in BCNF. We can fix the problem by following the decomposition rule, using the above FD. The first schema of the decomposition is the attributes of the FD itself, that is: {studioName, studioAddr}. The second schema is all the attributes of MovieStudio except for studioAddr, because the latter attribute is on the right of the FD used in the decomposition. Therefore, the other schema is:

    {title, year, length, filmType, studioName}

The projection of Figure (b) onto these schemas gives us the two relations MovieStudio1 and MovieStudio2 shown in Figure (c) and Figure (d). Each of these is in BCNF. Recall from "Projecting Functional Dependencies" that for each of the relations in the decomposition, we need to compute its FD's by computing the closure of each subset of its attributes, using the full set of given FD's. Generally, the process is exponential in the number of attributes of the decomposed relations, but we also saw in "Projecting Functional Dependencies" that there were some simplifications possible.

In our case, it is easy to decide that a basis for the FD's of MovieStudio1 is

    title year length filmType studioName

and for MovieStudio2 the only nontrivial FD is

    studioName studioAddr

Thus, the sole key for MovieStudio1 is {title, year}, and the sole key for MovieStudio2 is {studioName}. In each case, there are no nontrivial FD's that do not contain these keys on the left.

The relation MovieStudio1

The relation MovieStudio2

In each of the previous examples, one judicious application of the decomposition rule is enough to create a collection of relations that are in BCNF. Generally, that is not the case.

Example (c) :  We could generalize Example (b) to have a chain of FDs longer than two. Consider a relation with schema

    {title, year, studioName, president, presAddr}

That is, each tuple of this relation tells about a movie, its studio, the president of the studio, and the address of the president of the studio. Three FDs that we would suppose in this relation are

    title year studioName

The sole key for this relation is {title, year}. In this way the last two FD's above violate BCNF. Assume we decide to decompose starting with

    studioName president

First, we should add to the right side of this functional dependency any other attributes in the closure of studioName. By the transitive rule applied to studioName → president and president → presAddr, we know

    studioName presAddr

Combining the two FD's with studioName on the left, we get:

    studioName president presAddr

This FD has a maximally expanded right side, so we shall now decompose into the following two relation schemas.

    {title, year, studioName}
    {studioName, president, presAddr}

If we follow the projection algorithm of "Projecting Functional Dependencies", we decide that the FD's for the first relation has a basis:

    title year studioName

while the second has

    studioName president

Therefore, the sole key for the first relation is {title, year}, and it is therefore in BCNF. However, the second has {studioName} for its only key but also has the FD:

    president presAddr

which is a BCNF violation. Therefore, we must decompose again, this time using the above FD. The resulting three relation schemas, all in BCNF, are:

    {title, year, studioName}
    {studioName, president}
    {president, presAddr}

Generally, we must keep applying the decomposition rule as many times as required, until all our relations are in BCNF. We can be sure of eventual success, because every time we apply the decomposition rule to a relation R, the two resulting schemas each have fewer attributes than that of R. As we saw in Example (c) of "Boyce-Codd Normal Form", when we get down to two attributes, the relation is sure to be in BCNF; sometimes relations with larger sets of attributes are also in BCNF.