Information Integration

As information becomes ever more important in our work and play, we find that existing information resources are being used in many new ways. For instance, consider a company that wants to provide on-line catalogs for all its products, so that people can use the World Wide Web to browse its products and place on-line orders. A large company has many divisions. Each division may have built its own database of products separately of other divisions. These divisions may use different DBMS's, different formations for information, perhaps even different terms to mean the same thing or the same term to mean different things.

Example : Imagine a company with various divisions that manufacture disks. One division's catalog might represent rotation rate in revolutions per second, another in revolutions per minute. Another might have neglected to represent rotation speed at all. A division manufacturing floppy disks might refer to them as "disks", while a division manufacturing hard disks might call them "disks" as well. The number of tracks on a disk might be referred to as "tracks" in one division, but "cylinders" in another.

Central control is not always the answer. Divisions may have invested huge amounts of money in their database long before information incorporation across divisions was recognized as a problem. A division may have been an independent company, recently acquired. For these or other reasons, these so-called legacy databases cannot be changed easily. Thus, the company must make some structure on top of the legacy databases to present to customers a unified view of products across the company.

One popular approach is the creation of data warehouses, where information from many legacy databases is copied, with the suitable translation, to a central database. As the legacy databases change, the warehouse is updated, but not essentially immediately updated. A common plan is for the warehouse to be reconstructed each night, when the legacy databases are likely to be less busy.

The legacy databases are thus able to continue serving the purposes for which they were created. New functions, such as providing an on-line catalog service through the Web, are done at the data warehouse. We also see data warehouses serving needs for planning and analysis. For example, company analysts may run queries against the warehouse looking for sales tendency, in order to better plan inventory and production. Data mining, the search for interesting and unusual patterns in data, has also been enabled by the building of data warehouses, and there are claims of improved sales through utilization of patterns discovered in this way. These and other issues of information integration are discussed in "Information Integration".