Parallel Computing

Parallel Computing

The ability to store huge amounts of data is important, but it would be of little use if we could not access large amounts of that data quickly. Thus, very huge databases also require speed enhancers. One important speedup is through index structures, which we shall mention in "Overview of Query Processing" and cover comprehensively in "Index Structures". Another way to process more data in a given time is to use parallelism. This parallelism manifests itself in several ways.

For example, since the rate at which data can be read from a given disk is fairly low, a few megabytes per second, we can speed processing if we use many disks and read them in parallel (even if the data originates on tertiary storage, it is "cached" on disks before being accessed by the DBMS). These disks may be part of an organized parallel machine, or they may be components of a distributed system, in which many machines, each responsible for a part of the database, communicate over a high-speed network when required.

Certainly, the ability to shift data rapidly, like the ability to store huge amounts of data, does not by itself guarantee that queries can be answered quickly. We still need to use algorithms that break queries up in ways that allow parallel computers or networks of distributed computers to make efficient use of all the resources. Thus, parallel and distributed management of very huge databases remains an active area of research and development; we consider some of its important ideas in "Parallel Algorithms for Relational Operations".




Tags