The Query Processor

The Query Processor

The portion of the DBMS that most affects the performance that the user sees is the query processor. In following figure the query processor is represented by two components:

1. The query compiler, which translates the query into an internal form called a query plan. The latter is a sequence of operations to be performed on the data. Often the operations in a query plan are implementations of "relational algebra" operations, which are discussed in "An Algebra of Relational Operations". The query compiler consists of three major units:

(a) A query parser, which builds a tree structure from the textual form of the query.

(b) A query preprocessor, which performs semantic checks on the query (e.g., making sure all relations mentioned by the query actually exist), and performing some tree transformations to turn the parse tree into a tree of algebraic operators representing the initial query plan.

(c) A query optimizer, which converts the preliminary query plan into the best available sequence of operations on the actual data.

The query compiler uses metadata and statistics about the data to decide which sequence of operations is likely to be the fastest. For example, the existence of an index, which is a specialized data structure that makes easy access to data, given values for one or more components of that data, can make one plan much faster than another.

2. The execution engine, which is responsible for carrying out each of the steps in the chosen query plan. The execution engine acts together with most of the other components of the DBMS, either directly or through the buffers. It must get the data from the database into buffers in order to control that data. It needs to act together with the scheduler to avoid accessing data that is locked, and with the log manager to make sure that all database changes are properly logged.

The Query Processor