The subject of database system implementation can be divided roughly into three sections:
1. Storage management: how secondary storage is used effectively to hold data and allow it to be accessed quickly.
2. Query processing: how queries expressed in a very high-level language such as SQL can be executed efficiently.
3. Transaction management: how to support transactions with the ACID properties discussed in "Transaction Processing"
Each of these topics is covered by several articles of this blog.
Storage-Management Overview"Data Storage" introduces the memory hierarchy. However, since secondary storage, especially disk, is so central to the way a DBMS manages data, we observe in the greatest detail the way data is stored and accessed on disk. The "block model" for disk-based data is introduced; it influences the way almost everything is done in a database system.
"Representing Data Elements" relates the storage of data elements - relations, tuples, attribute-values, and their equivalents in other data models - to the requirements of the block model of data. Then we look at the important data structures that are used for the construction of indexes. Recall that an index is a data structure that supports efficient access to data. "Index Structures" covers the important one-dimensional index structures - indexed-sequential files, B-trees, and hash tables. These indexes are commonly used in a DBMS to support queries in which a value for an attribute is given and the tuples with that value are desired. B-trees also are used for access to a relation sorted by a given attribute. "Multidimensional and Bitmap Indexes" discusses multidimensional indexes, which are data structures for specialized applications such as geographic databases, where queries usually ask for the contents of some region. These index structures can also support complex SQL queries that limit the values of two or more attributes, and some of these structures are beginning to appear in commercial DBMS's.
Query-Processing Overview"Query Execution" covers the basics of query execution. We learn a number of algorithms for efficient execution of the operations of relational algebra. These algorithms are designed to be efficient when data is stored on disk and are in some cases rather different from analogous main-memory algorithms.
In "The Query Compiler" we consider the architecture of the query compiler and optimizer. We begin with the parsing of queries and their semantic checking. Next, we consider the conversion of queries from SQL to relational algebra and the selection of a logical query plan, that is, an algebraic expression that represents the particular operations to be performed on data and the necessary constraints regarding order of operations. Finally, we discover the selection of a physical query plan, in which the particular order of operations and the algorithm used to implement each operation have been specified.
Transaction-Processing OverviewIn "Coping With System Failures" we see how a DBMS supports durability of transactions. The central idea is that a log of all changes to the database is made. Anything that is in main-memory but not on disk can be lost in a crash (say, if the power supply is interrupted). Therefore we have to be careful to move from buffer to disk, in the proper order, both the database changes themselves and the log of what changes were made. There are several log strategies available, but each limits our freedom of action in some ways.
Then, we take up the matter of concurrency control - assuring atomicity and isolation - in "Concurrency Control". We view transactions as sequences of operations that read or write database elements. The major topic of this portion is how to manage locks on database elements: the different types of locks that may be used, and the ways that transactions may be allowed to obtain locks and release their locks on elements. Also studied are a number of ways to assure atomicity and isolation without using locks.
"More About Transaction Management" concludes our study of transaction processing. We consider the interaction between the requirements of logging, as discussed in "Coping With System Failures", and the requirements of concurrency that were discussed in "Concurrency Control". Handling of deadlocks, another important function of the transaction manager is covered here as well. The extension of concurrency control to a distributed environment is also considered in "More About Transaction Management". Finally, we introduce the possibility that transactions are "long", taking hours or days rather than milliseconds. A long transaction cannot lock data without causing chaos among other potential users of that data, which forces us to rethink concurrency control for applications that involve long transactions.
- transaction manager
- Queries Involving More Than One Relation
- Null Values and Comparisons Involving NULL
- Projection in SQL
- The Database Language SQL
- Additional Constraint Examples
- Extending the Projection Operator
- Extended Operators of Relational Algebra
- Selection on Bags / Product of Bags / Joins of Bags
- Union, Intersection, and Difference of Bags
- Relational Operations on Bags
- Dependent and Independent Operations
- Selection / Cartesian Product
- Set Operations on Relations
- An Algebra of Relational Operations
- Relational Algebra
- Information Integration Via Semistructured Data
- Nested Relations
- The Object-Relational Model
- Representing ODL Relationships
- Representing Other Type Constructors
- Nonatomic Attributes in Classes
- Declaring Keys in ODL
- Relationships in ODL / Inverse Relationships
- Review of Object-Oriented Concepts
- Reasoning About Multivalued Dependencies
- Definition of Multivalued Dependencies
- Multivalued Dependencies
- Boyce-Codd Normal Form
- Decomposing Relations
- Why the Closure Algorithm Works
- Trivial Functional Dependencies
- Rules About Functional Dependencies
- Keys of Relations
- Using Null Values to Combine Relations - Comparison of Approaches
- Converting Subclass Structures to Relations
- From E/R Diagrams to Relational Designs
- Relation Instances
- Equivalent Representations of a Relation
- Tuples / Domains
- Converting Multiway Relationships to Binary
- Multiway Relationships
- The Entity-Relationship Data Model
- Summary of The World of Database Systems
- Information Integration Overview
- Database Design
- Transaction Processing
- Overview of a Database Management System
- Information Integration
- Multimedia Data
- Client-Server and Multi-Tier Architectures
- Parallel Computing
- Smaller and Smaller Systems
- Relational Database Systems
- Corporate Records
- Banking Systems
- Airline Reservations Systems
- Early Database Management Systems
- The Evolution of Database Systems
- The Worlds of Database Systems