Almost every customer and prospect I have talked to in the last few years has had performance as one of the top three criteria of evaluating an EII platform. Actually, this might be true for any data management offering. For EII, performance takes on an added dimension.
Examining the performance of an EII system
An EII platform’s performance has to be evaluated from two points of view:
- The performance of the distributed query engine
- The performance impact on the data sources
The first of these refers mainly to the query optimization capabilities of the EII query engine, which is relatively easier to evaluate. At a high level, you can look at two aspects of this:
- What optimization features does the query engine provide
- What tuning capabilities are available to a data architect/administrator
Rule based optimization
At a minimum, any query engine will provide some amount of optimization based on heuristics, referred to as rule based optimization. Usually, all query optimization takes a query expression tree, and tries to convert it to another equivalent tree (the equivalence being governed by the algebra for the query language). Presumably, the rules or heuristics followed in performing rule based will result in a tree that can be evaluated at a lower cost that the tree you started with.
For stand alone SQL engines, these rules are cut and dried. For distributed SQL engines, additional factors need to be taken into account such as the capabilities of the remote data sources, the amount of data being passed between EII server and the remote data source etc. For instance, a common form of rule-based optimization involves pushdown analysis. The goal of pushdown analysis is to determine whether a remote relational data source performs a SQL operation efficiently so as to reduce the size of the data set that the EII server must handle. This becomes especially important for table join operations involving complex query criteria, in which large amounts of filtered information may need to flow between different databases. The EII server uses a set of rules and heuristics to determine which pushdown operations will best shift the load to the data source and minimize overall system data handling.
The query engine inside the EII server takes a SQL request from a client, parses and optimizes the query to decide which SQL operations to perform locally and which operations to send to the remote data source for execution. The EII query engine will push down the search condition as much as possible to the data source for execution, because it avoids fetching a big result set back from the remote source. It also sends join operations on tables from the same data source to the remote resource for execution instead of fetching the tables back and doing the join locally. In addition, the EII query engine will push down any sorting criteria if possible so that result are returned from the remote sources sorted and minimize the need to re-sort the final result within the EII query engine.
While this discussion was mostly in the context of distributed SQL engines, the same concepts will apply to distributed XQuery engines as well. The way to evaluate how good a job a query engine does in terms of these kinds of optimizations, try out some distributed queries and see how the EII platform converts these queries to queries on the remote data sources.
In my next post, I will write about cost based optimization (more interesting and adaptive) and what it means in an EII context.
I came to your article from another article and am really interested in this learning about this. , I feel strongly about information and love learning more on this. If possible, as you gain expertise, It is extremely helpful for me. would you mind updating your blog with more information?
http://www.jerseysonline.co
Posted by: leiqiong | December 27, 2010 at 12:46 AM