Compiler Support for Efficient Processing of XML Datasets

Xiaogang Li, Renato Ferreira, Gagan Agrawal

(Paper #95)


Abstract

Declarative, high-level, and/or application-class specific languages are often su ccessful in easing application development. In this paper, we report our experi ences in compiling a recently developed XML Query Language, XQuery for applicati ons that process scientific datasets. Though scientific data processing applications can be conveniently represented i n XQuery, compiling them to achieve efficient execution involves a number of c hallenges. These are, 1) analysis of recursive functions to identify reduction co mputations involving only associative and commutative operations, 2) replacement of recursive functions with iterative constructs, 3) parallelization of genera lized reduction functions, which particularly requires the synthesis of global red uction functions, 4) application of data-centric transformations on the structure of XQuery, and 5) translation of XQuery processing to an imperative language lik e C/C++, which is required for using a middleware that offers low-level functiona lity. This paper describes our solutions towards these problems. By implementing the tec hniques in a compiler and generating code for a runtime system called Active Data Repository (ADR), we are able to achieve efficient processing of disk-resident da tasets and parallelization on a cluster of machines. Our experimental results s how that: 1) restructuring transformations, i.e. removing recursion and applying data-centric execution, result in several-folds improvement in performance, and 2 ) parallel versions achieve good load-balance, and incur no significant overhead s besides communication.

Keywords:

Scientific Computing
Compilers
Programming Languages
Database
Case Study/Experience Report