In my last post, I described how rule based optimization applies in an EII scenario. Cost based optimization is an improvement on that, since it applies situational knowledge of the backend data sources to optimize the query plan. While Rule Based Optimization is hard-wired logic, Cost Based Optimization is dynamic, and uses its knowledge of the current state of the underlying database to arrive at the most optimal plan for executing a query.
CBO in a standalone Query Engine
In the case of a standalone database, the database engine maintains statistics about the data it stores. Examples of the type of statistics maintained includes things like the number of keys in an index, the average number of matches for a given key in an index, the number of rows in various tables etc. These statistics are used when optimizing a query to determine things like join ordering, the type of join algorithm to use etc. These determinations are made based on calculations of cost (hence Cost Based Optimization) of performing various operations involved in executing the query. The statistics maintained by the database aid in these calculations.
CBO in an EII Query Engine
In an EII scenario, Cost Based Optimization is a bit more complex. Various data sources have varying capabilities for maintaining statistics. For instance, while most relational databases maintain at least some basic statistics, other data sources like web services provide none. When a user query requires the EII server to query multiple data sources and then “join” the information across the data sources, the server has to take into account the statistics it has retrieved from the different data sources, the lack of statistics from some of them, and determine the proper orchestration of the query execution.
Here’s a simple example that shows how CBO can contribute to
building an optimal query plan. Consider a scenario where detailed customer
information is maintained in a CRM system built on an Oracle database, and an
Order Entry System built on a DB2 database holders the Order information. While
the DB2 database has customer information, we want to join the order
information with the richer customer information available in the Oracle
database. Say we want detailed customer information and order details for all
customers located in “
Next time, I will post on the various takes on caching in EII systems and the impact of caching policies on EII system and backend data source performance.