*Decomposition into BCNF*

By frequently selecting appropriate decompositions, we can break any relation schema into a collection of subsets of its attributes with the following important properties:

1. These subsets are the schemas of relations in BCNF.

2. The data in the original relation is represented faithfully by the data in the relations that are the result of the decomposition, in a sense to be made exact in "Recovering Information from a Decomposition". Approximately, we need to be able to reconstruct the original relation instance precisely from the decomposed relation instances.

Example (c) of "Boyce-Codd Normal Form" suggests that maybe all we have to do is break a relation schema into two-attribute subsets, and the result is definitely in BCNF. However such an arbitrary decomposition will not satisfy condition (2), as we shall see in "Recovering Information from a Decomposition". Actually, we must be more careful and use the violating FD's to guide our decomposition.

The decomposition strategy we shall follow is to look for a nontrivial FD A_{1}A_{2} . . . A_{n} → B_{1}B_{2} . . . B_{m} that violates BCNF; i.e., {A_{1}, A_{2}, . . . , A_{n}} is not a superkey. As a heuristic, we shall usually add to the right side as many attributes as are functionally determined by {A_{1},A_{2}, . . . ,A_{n}}. Figure (a) demonstrates how the attributes are broken into two overlapping relation schemas. One is all the attributes involved in the violating FD, and the other is the left side of the FD plus all the attributes not involved in the FD, i.e., all the attributes except those B's that are not A's.

**Example (a) :**Consider our running example, the Movies relation of Figure 1 of "Design of Relational Database Schemas / Anomalies". We saw in Example (a) of "Boyce-Codd Normal Form" that

**title year length filmType studioName**

is a BCNF violation. In this case, the right side already contains all the attributes functionally determined by title and year, so we shall use this BCNF violation to decompose Movies into:

1. The schema with all the attributes of the FD, that is:

**{title, year, length, filmType, studioName}**

2. The schema with all attributes of Movies except the three that appear on the right of the FD. Therefore, we remove length, filmType, and studioName, leaving the second schema:

**{title, year, starName}**

Notice that these schemas are the ones selected for relations Movies1 and Movies2 in example of "Decomposing Relations". We examined that these are each in BCNF in Example (b) of "Boyce-Codd Normal Form".

**Example (b) :**Let us think about the relation that was introduced in "The Transitive Rule" example. This relation, which we shall call MovieStudio, stores information about movies, their owning studios, and the addresses of those studios. The schema and some typical tuples for this relation are shown in Figure (b) above.

Note that MovieStudio includes redundant information. Because we added to our customary sample data a second movie owned by Paramount, the address of Paramount is stated twice. However, the source of this problem is not the same as in Example (a). In the latter example, the problem was that a many-many relationship (the stars of a given movie) was being stored with other information about the movie. Here, everything is single-valued: the attributes length and filmType for a movie, the relationship Owns that relates a movie to its unique owning studio, and the attribute studioAddr for studios.

In this case, the problem is that there is a "transitive dependency". That is, as mentioned in "The Transitive Rule" example, relation MovieStudio has the FD's:

**title year**

**studioName**

**studioName**

**studioAddr**

We may apply the transitive rule to these to get a new FD:

**title year**

**studioAddr**

That is, a title and year (i.e., the key for movies) functionally determine a studio address - the address of the studio that owns the movie. Since

**title year**

**length filmType**

is another clear functional dependency, we conclude that {title, year} is a key for MovieStudio: in reality it is the only key.

On the other hand, FD:

**studioNarne**

**studioAddr**

which is one of those used in the application of the transitive rule above, is non-trivial but its left side is not a superkey. This study tells us MovieStudio is not in BCNF. We can fix the problem by following the decomposition rule, using the above FD. The first schema of the decomposition is the attributes of the FD itself, that is: {studioName, studioAddr}. The second schema is all the attributes of MovieStudio except for studioAddr, because the latter attribute is on the right of the FD used in the decomposition. Therefore, the other schema is:

**{title, year, length, filmType, studioName}**

The projection of Figure (b) onto these schemas gives us the two relations MovieStudio1 and MovieStudio2 shown in Figure (c) and Figure (d). Each of these is in BCNF. Recall from "Projecting Functional Dependencies" that for each of the relations in the decomposition, we need to compute its FD's by computing the closure of each subset of its attributes, using the full set of given FD's. Generally, the process is exponential in the number of attributes of the decomposed relations, but we also saw in "Projecting Functional Dependencies" that there were some simplifications possible.

In our case, it is easy to decide that a basis for the FD's of MovieStudio1 is

**title year**

**length filmType studioName**

and for MovieStudio2 the only nontrivial FD is

**studioName**

**studioAddr**

Thus, the sole key for MovieStudio1 is {title, year}, and the sole key for MovieStudio2 is {studioName}. In each case, there are no nontrivial FD's that do not contain these keys on the left.

In each of the previous examples, one judicious application of the decomposition rule is enough to create a collection of relations that are in BCNF. Generally, that is not the case.

**Example (c) :**We could generalize Example (b) to have a chain of FDs longer than two. Consider a relation with schema

**{title, year, studioName, president, presAddr}**

That is, each tuple of this relation tells about a movie, its studio, the president of the studio, and the address of the president of the studio. Three FDs that we would suppose in this relation are

**title year**

**studioName**

studioName

studioName

**president**

president

president

**presAddr**

The sole key for this relation is {title, year}. In this way the last two FD's above violate BCNF. Assume we decide to decompose starting with

**studioName**

**president**

First, we should add to the right side of this functional dependency any other attributes in the closure of studioName. By the transitive rule applied to studioName → president and president → presAddr, we know

**studioName**

**presAddr**

Combining the two FD's with studioName on the left, we get:

**studioName**

**president presAddr**

This FD has a maximally expanded right side, so we shall now decompose into the following two relation schemas.

**{title, year, studioName}**

{studioName, president, presAddr}

{studioName, president, presAddr}

If we follow the projection algorithm of "Projecting Functional Dependencies", we decide that the FD's for the first relation has a basis:

**title year**

**studioName**

while the second has

**studioName**

**president**

president

president

**presAddr**

Therefore, the sole key for the first relation is {title, year}, and it is therefore in BCNF. However, the second has {studioName} for its only key but also has the FD:

**president**

**presAddr**

which is a BCNF violation. Therefore, we must decompose again, this time using the above FD. The resulting three relation schemas, all in BCNF, are:

**{title, year, studioName}**

{studioName, president}

{president, presAddr}

{studioName, president}

{president, presAddr}

Generally, we must keep applying the decomposition rule as many times as required, until all our relations are in BCNF. We can be sure of eventual success, because every time we apply the decomposition rule to a relation R, the two resulting schemas each have fewer attributes than that of R. As we saw in Example (c) of "Boyce-Codd Normal Form", when we get down to two attributes, the relation is sure to be in BCNF; sometimes relations with larger sets of attributes are also in BCNF.