Monday, October 13, 2008

Access Path Selection in a Relational Database Management System

This paper discusses how System R selects access paths for queries, both simple and complex. It briefly describes the four phases of processing a SQL statement (parsing, optimization, code gen, execution) but then goes in-depth into how the optimizer works. The optimizer attempts to find the best access path by finding one that minimizes needed CPU and I/O resources; it chooses between "interesting" order and "unordered" access paths based on the given query's ordering requirements (e.g., "group by" and "order by") and a so-called selectivity factor that it uses in calculating the cost.

The optimizer chooses between doing nested-loop and merge scan joins for any n-way joins, which it treats as a series of 2-way joins. The optimal ordering of these joins is created using a search tree of equivalence classes. The best way of accessing single relations is found, and then the best way to join any relation with these relations is found, and the cost is built up layer by layer, so to speak. The authors mention that other path selectors exist but do not present alternative methods to the ones they chose, nor do they provide any kind of comparative testing or quantify their method's performance.

No comments: