Home

About

Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network
FAQHELPSITE MAPCONTACT US


  Advanced Search  

 
Generating Synthetic Complex-structured XML Data Ashraf Aboulnaga Jeffrey F. Naughton Chun Zhang
 

Summary: Generating Synthetic Complex-structured XML Data
Ashraf Aboulnaga Jeffrey F. Naughton Chun Zhang
Computer Sciences Department
University of Wisconsin - Madison
{ashraf,naughton,czhang}@cs.wisc.edu
Abstract
Synthetically generated data has always been important for evaluating and understanding new ideas in database research.
In this paper, we describe a data generator for generating synthetic complex-structured XML data that allows for a high
level of control over the characteristics of the generated data. This data generator is certainly not the ultimate solution
to the problem of generating synthetic XML data, but we have found it very useful in our research on XML data
management, and we believe that it can also be useful to other researchers. Furthermore, we hope that this paper starts
a discussion in the XML community about characterizing and generating XML data, and that it may serve as a first step
towards developing a commonly accepted XML data generator for our community.
1 Introduction
Synthetically generated data is very useful in evaluating and understanding new ideas in database research. For example,
research on relational databases often uses synthetic data from the Wisconsin benchmark [DeW93], TPC-C [TPCC], or
TPC-H [TPCH], and research on object oriented databases often uses synthetic data from the OO7 benchmark [CDN93].
Synthetic data generators allow us to generate large volumes of data with well-understood characteristics. We can
easily vary the characteristics of the generated data by varying the input parameters of the data generator. This allows us
to systematically cover much more of the space of possible data sets than relying solely on real data over which we have

  

Source: Aboulnaga, Ashraf - School of Computer Science, University of Waterloo
Naughton, Jeffrey F. - Department of Computer Sciences, University of Wisconsin at Madison

 

Collections: Computer Technologies and Information Sciences