Ipedo Ipedo Blogs
My Photo

« April 2007 | Main | December 2007 »

Transformations with the XIP XQuery Engine

The last post touched in the transformation capabilities in XIP’s SQL engine. Now, we turn to its dual-core counterpart, the XQuery engine.


XIP can federate queries over relational, XML and other semi/unstructured data sources. On the consumption side, it can expose a virtual relational or XML database interface to applications. Inherent to this is the ability to transform data from various source formats to required target formats. The XML-2-XML transformation functionality in XIP is available through its XSLT and XQuery engines. These engines provide a standards based, powerful way of specifying any arbitrary transformation from one XML source format to another. In this post we will focus on the XQuery engine, but a number of ideas also apply to the XSLT engine in that both these engines share the same, extensible function library.


Typical transformation abilities in the data integration scenario include schema conversions including and not limited to changes in document structure, combining/splitting or otherwise transcoding content of elements and attributes, joining, filtering, union, aggregating/summarizing etc. XQuery provides constructs for all of these. I will show a few examples and give some direction as to how the platform can be extended to achieve more complex transformations. As an EII platform, XIP does not have nearly as comprehensive a transformation tool set as you would expect from a leading ETL vendor. But this is also appropriate, since real time data integration is typically not suited for very complex or computationally expensive transformations.


Functions and Operators:

To start with the basics, XIP’s XQuery engine comes with an extensive function library which includes all the basic functions for string manipulation, numerical functions such as floor/ceiling/round, aggregate functions such as avg/min/max/count etc., data type conversions, date manipulation functions and so on. Also available are a number of operators for number and date arithmetic. This is a rich set of functionality for creating new values from source data.


Structural Transformations:

Utilizing the path expressions, Conditional expressions, Element constructors and constructs like FLWOR, there are virtually limitless ways you can process a source document structure and construct an arbitrarily complex output XML structure. I will illustrate this point through a transformation analogous to a pivot in the relational world. Consider the sample document shown below:

Pivot_input_2

Converting this structure to use one in which the data values become element names is a simple matter of using computed element constructors along with a FLWOR construct.

Pivot_query_2

You can see that an approach like this can tackle arbitrarily complex structural transformations.

Unions are another common operation in an ETL scenario. A simple a query that combines a series of customer records from an XML view over on Oracle database and a view over the salesforce.com web service would be written as follows:


<LargeCustomers>

            {view(“oradata/crm”)/Customers/Customer[NumEmp > 5000]

              union

            view("salesforce.com/Account")/AccountList/Account[NumberOfEmployees > 5000]

            }

</LargeCustomers>


The expressions that generate the customer elements (in this case, the path expressions) can be arbitrarily complex XQuery expressions.


Extensibility

XIP’s XQuery engine also allows custom transformations by allowing you to code custom external functions in Java. This allows for just about any transformation that you can dream up. The XQuery modules extension feature also allows you to package up common enterprise transformation functions as a library that can be shared by other application developers and data architects.


What you have seen in this post is but a glimpse of XML transformation capabilities in the XIP platform. Get a download and start playing with these features to see what else you can do.

Data Transformation Using the Ipedo SQL Engine

There are lots of times when you need to transform data in a query federation scenario. In this section, I will discuss data transformation capabilities built into Ipedo’s SQL engine.

Ipedo’s SQL engine is based on industry standards and implements SQL92 and SQL99 features. It supports all core data types, SQL statements and built-in functions, thereby providing transformation capabilities for data conversion, data transformation (or conditional transformation) and data manipulations. Furthermore, when used in conjunction with advanced features such as user defined function in the SQL engine user can even integrate or interface with other BPM or messaging products. However, I will postpone the discussion of user defined function to the next post. In this post, let’s look at examples showing various ways data transformation can be implemented using the SQL engine.   

Example 1: Adding New Column Data Base on Country

Let’s first execute a query “select customerid, companyname, contactname, address, city, postalcode, country from customers” and look at the output shown below:Customers
Notice the output: it shows a list of customers and the countries where customers are located. However, there is no indication of the region where customer’s country is located. Let’s rewrite the query and add a column with region information based on the country. We’ll use a case expression to conditionally specify region for different countries. For simplicity, we’ll categorize all countries in North and South America as in region “Americas” and all European countries as region “Europe”.

select customerid, companyname, contactname, address, city, postalcode, country,
case when country='Brazil' then 'Americas'
when country='USA' then 'Americas'
when country='Venezuela' then 'Americas'
Else 'Europe'
End as region
from customers
order by region

Now, execute this query and look at the output shown in the screen below. You’ll see a new column “Region” and it display the region values either Europe or Americas. You can use the same technique to check for data errors and other purposes as well.Add_new_column_with_error

Example 2: Checking for Error Conditions

In the example above, I’ve made an error that you may have already noticed. I forgot to include countries such as Argentina, Canada and others in the Americas region. As a result, they’re mistakenly being categorized into the Europe region. Let’s write another query to catch those errors.

Again we add another case expression to check for each country and region value. If we find an error, we’ll display “ERROR: WRONG REGION CODE DETACTED” next to the country under a new column “Error”. For countries that are correctly categorized, we’ll display “NO ERROR FOUND”

select customerid, companyname, contactname, address, city, postalcode, country,
case when country='Brazil' then 'Americas'
when country='USA' then 'Americas'
when country='Venezuela' then 'Americas'
Else 'Europe'
End as region,
Case when (country='Argentina' or country='Canada' or country='Mexico')
      and region = 'Europe'
     then 'ERROR: WRONG REGION CODE DETACTED'
Else 'NO ERROR FOUND'
END AS error
from customers
order by region, country

Execute the query and here’s the output.Catch_error

Example 3: Concatenate Data and Convert Data Types and Currency

Let’s look at the output below by first executing the query: select orderid, customerid, freight, shipcountry from ADMIN.ORDERS.Orders

Notice the amount of freight charges are all in US dollars for all countries, including both European countries and Americas countries. Secondly, there’s no currency symbol displayed. Here is what we are going to do. We’ll look for all countries in the Europe region (as we classified in our previous queries) and convert their freight charge from US dollars to European common currency Euro and use letter “e” as the symbol for the Euro. We will then convert the freight charge from decimal type into a character string and concatenate the currency symbol with the freight charge in character form. Here is the query:

select orderid, customerid, freight,
case when (select region from exchangerate where shipcountry=country)='Europe'
then 'e' || cast(freight*(select rate from exchangerate where country =           shipcountry) as char(20))
when (select region from exchangerate where shipcountry=country)= 'Americas'
then '$' || cast(freight*(select rate from exchangerate where country = shipcountry) as char(20))
end as localizedFreightCharge,
shipcountry
from ADMIN.ORDERS

The “||” symbol is the operator for concatenation and the cast function is used to convert decimal data type to character type. Finally, the currency exchange rate we use for calculation and determine which country uses what currency is by using a sub query lookup.Castconcateandcurrencyconvert

That’s it. As you can see, you can do a lot data transformation just with the SQL language.