Distributed processing of very large datasets with DataCutter. We describe a framework, called DataCutter, that is designed to provide support for subsetting and processing of datasets in a distributed and heterogeneous environment. We illustrate the use of DataCutter with several data-intensive applications from diverse fields, and present experimental results.

