Data Intensive Abstractions for High End Biometric Applications
PIs: Douglas Thain and Patrick Flynn. This work is supported by the National Science Foundation under grant CNS-06-21434.
Biometric research requires the execution of very large data intensive batch workloads.
To evaluate new matching algorithms, researchers wish to compare thousands of images
to each other by brute force. When these sort of workloads are submitted to conventional
batch systems in the usual way, they induce massive amount ofs network and I/O traffic
that result in very poor throughput. How can we execute such large workloads effectively?
To attack this problem, we are introducing data intensive abstractions that
allow the user to easily provide the system with more information about the structure of a workload
so that is can partition the data and execute it effectively.
The abstraction explicitly specifies the data to be processed, the code that will process it,
and the relationship between the two. One example of an abstraction is All-Pairs:
All-Pairs( set S, function F ):|
For all Si and Sj in set S, compute: F( Si, Sj )
A computing system with an All-Pairs interface can easily find a more efficient
implementation than a demand-paged filesystem. The input data can be staged
to the computation nodes by a spanning tree, and the partitioning of work units
into jobs can be done according to the performance properties of the system.
In this project, we are designing a variety of similar data intensive abstractions
that allow for the easy and efficient execution of large scientific workloads.
(Showing papers with tag abstractions. See all papers instead.)
Li Yu, Christopher Moretti, Andrew Thrasher, Scott Emrich, Kenneth Judd, and Douglas Thain,
Harnessing Parallelism in Multicore Clusters with the All-Pairs, Wavefront, and Makeflow Abstractions,
Journal of Cluster Computing, 13(3), pages 243-256, September, 2010. DOI: 10.1007/s10586-010-0134-7
Christopher Moretti, Hoang Bui, Karen Hollingsworth, Brandon Rich, Patrick Flynn, and Douglas Thain,
All-Pairs: An Abstraction for Data Intensive Computing on Campus Grids,
IEEE Transactions on Parallel and Distributed Systems, 21(1), pages 33-46, January, 2010. DOI: 10.1109/TPDS.2009.49
Douglas Thain, Sander Klous, Justin Wozniak, Paul Brenner, Aaron Striegel, and Jesus Izaguirre,
Separating Abstractions from Resources in a Tactical Storage System,
IEEE/ACM Supercomputing, pages 55-67, November, 2005. DOI: 10.1109/SC.2005.64