Overview of Query Processing

Overview of Query Processing

The great majority of connections with the DBMS follow the path on the left side of following figure. A user or an application program starts some action that does not affect the diagram of the database, but may affect the content of the database (if the action is a modification command) or will extract data from the database (if the action is a query). Remember from "The Evolution of Database Systems" that the language in which these commands are expressed is called a data-manipulation language (DML) or somewhat colloquially a query language. There are many data-manipulation languages available, but SQL, which was mentioned in "Relational Database Systems" example, is by far the most commonly used. DML statements are handled by two separate subsystems, as follows.

Overview of Query Processing

Answering The Query:

The query is parsed and optimized by a query compiler. The resulting query plan, or sequence of actions the DBMS will perform to answer the query, is passed to the execution engine. The execution engine issues a sequence of requests for small pieces of data, typically records or tuples of a relation, to a resource manager that knows about data files (holding relations), the format and size of records in those  files, and index  files, which help find elements of data files quickly.

The requests for data are translated into pages and these requests are passed to the buffer manager. We shall discuss the role of the buffer manager in "Storage and Buffer Management", but briefly, its task is to bring appropriate portions of the data from secondary storage (disk, normally) where it is kept permanently, to main-memory buffers. Usually, the page or "disk block" is the unit of transfer between buffers and disk.

The buffer manager communicates with a storage manager to get data from disk. The storage manager might involve operating-system commands, but more typically, the DBMS issues commands directly to the disk controller.

Transaction Processing:

Queries and other DML actions are grouped into transactions, which are units that must be executed atomically and in isolation from one another. Often each query or modification action is a transaction by itself. In addition, the execution of transactions must be durable, meaning that the effect of any completed transaction must be preserved even if the system fails in some way right after completion of the transaction. We divide the transaction processor into two major parts:

1. A concurrency-control manager, or scheduler, responsible for assuring atomicity and isolation of transactions, and

2. A logging and recovery manager, responsible for the durability of transactions.

We shall consider these components further in "Transaction Processing"