|CCL HomeSoftware Community Operations||
Recent News in the CCL
Some of our recent work on a system for collaborative engineering design was recently featured in the September issue of IEEE Computing in Science and Engineering focused on "Open Simulation Laboratories" This project was part of a collaboration between faculty in the computer science and civil engineering departments, Open Sourcing the Design of Civil Infrastructure.
CCL grad student Peter Sempolinski led the design and implementation of an online service enabling collaborative design and evaluation of structures, known as the "Virtual Wind Tunnel". This service enables structural designs to be uploaded and shared, then evaluated for performance via the OpenFOAM CFD package. The entire process is similar to that of collaborative code development, where the source (i.e. a building design) is kept in a versioned repository, automated builds (i.e. building simulation) are performed in a consistent and reproducible way, and test results (i.e. simulation metrics) are used to evaluate the initial design. Designs and results can be shared, annotated, and re-used, making it easy for one engineer to build upon the work of another.
The prototype system has been used in a variety of contexts, most notably to demonstrate the feasibility of crowdsourcing design and evaluation work via Amazon Turk.
Wed, 09 Sep 2015 13:27:00 +0000
IEEE Cluster Computing conference in Chicago, Ben Tovar will present some of our work on automated application monitoring:
Matthias Wolf will present our work on the Lobster large scale data management system:
Olivia Choudhury will present some work on modelling concurrent applications, trading off thread-level parallelism against task-level parallelism at scale:
The Cooperative Computing Lab is pleased to announce the release of version 5.2.0 of the Cooperative Computing Tools including Parrot, Chirp, Makeflow, WorkQueue, SAND, All-Pairs, Weaver, and other software.
The software may be downloaded here:
This minor release considers the following issues from version 5.1.0:
Thanks goes to our contributors:
Please send any feedback to the CCTools discussion mailing list:
Enjoy! Wed, 19 Aug 2015 12:05:00 +0000
Peter Bui is returning to Notre Dame this fall, where he will be a member of the teaching faculty and will be teaching undergraduate core classes like data structures, discrete math, and more. Welcome back, Prof. Bui!
Hoang Bui completed a postdoc position at Rutgers University with Prof. Manish Parashar, and is starting as an assistant professor at Western Illinois University. Congratulations, Prof. Bui!
Tue, 18 Aug 2015 15:16:00 +0000
We have been working closely with the CMS physics group at Notre Dame for the last year to build Lobster, a data analysis system that runs on O(10K) cores to process data produced by the CMS experiment at the LHC. At peak, Lobster at ND delivers capacity equal to that of a dedicated CMS Tier-2 facility!
Existing data analysis systems for CMS generally require that the user be running in a cluster that has been set up just so for the purpose: exactly the right operating system, certain software installed, various user identities present, and so on. This is fine for the various clusters dedicated to the CMS experiment, but it leaves unused the enormous amount of computing power that can be found at university computing centers (like the ND CRC), national computing resources (like XSEDE or the Open Science Grid), and public cloud systems.
Lobster is designed to harness clusters that are not dedicated to CMS. This requires solving two problems:
To do this, we build upon a variety of technologies for distributed computing. Lobster uses Work Queue to dispatch tasks to thousands of machines, Parrot with CVMFS to deliver the complex software stack from CERN, XRootD to deliver the LHC data, and Chirp and Hadoop to manage the output data.
Lobster runs effectively on O(10K) cores so far, depending on the CPU/IO ratio of the jobs. These two graphs show the behavior of a production run on top of HTCondor at Notre Dame hitting up to 10K cores over the course of a 48-hour run. The top graph shows the number of tasks running simultaneously, while the bottom shows the number of tasks completed or failed in each 10-minute interval. Note that about two thirds of the way through, there is a big hiccup, due to an external network outages. Lobster accepts the failures and keeps on going.
Lobster has been a team effort between Physics, Computer Science, and the Center for Research Computing: Anna Woodard and Matthias Wolf have taken the lead in developing the core software; Ben Tovar, Patrick Donnelly, and Peter Ivie have improved and debugged Work Queue, Parrot, and Chirp along the way; Charles Mueller, Nil Valls, Kenyi Anampa, and Paul Brenner have all worked to deploy the system at scale in production; Kevin Lannon, Michael Hildreth, and Douglas Thain provide the project leadership.
Fri, 14 Aug 2015 15:19:00 +0000
Haipeng Cai successfully defended his dissertation, "Cost-effective Dependence Analyses for Reliable Software Evolution", which studied methods for efficiently determining the scope of complex software system that is affected by a given change.
Haipeng will be taking a postdoctoral research position at Virginia Tech under the supervision of Prof. Barbara Ryder.
Congratulations to Dr. Haipeng Cai!
Thu, 16 Jul 2015 17:42:00 +0000
The Cooperative Computing Lab is pleased to announce the release of version 5.1.0 of the Cooperative Computing Tools including Parrot, Chirp, Makeflow, WorkQueue, SAND, All-Pairs, Weaver, and other software.
The software may be downloaded here:
This minor release adds a couple of small features, and fixes the following
issues of version 5.0.0:
Thanks goes to our contributors:
Please send any feedback to the CCTools discussion mailing list:
~ Thu, 16 Jul 2015 16:42:00 +0000
The Cooperative Computing Lab is pleased to announce the release of version 5.0.0 of the Cooperative Computing Tools including Parrot, Chirp, Makeflow, WorkQueue, SAND, All-Pairs, Weaver, and other software.
The software may be downloaded here: CCTools download
This is a major release that incorporates the preview of three new tools:
Thanks goes to the contributors for many features and bug fixes: Matthew Astley, Jakob Blomer, Ryan Boccabella, Peter Bui, Patrick Donnelly, Nathaniel Kremer-Herman, Victor Hawley, Nicholas Hazekamp, Peter Ivie, Kangkang Li, Haiyan Meng, Douglas Thain, Ben Tovar, Andrey Tovchigrechko, and Charles Zheng.
Please send any feedback to the CCTools discussion mailing list: mailing list
Tue, 07 Jul 2015 17:21:00 +0000
Haiyan Meng presented our work on Preservation Framework for Computational Reproducibility at the International Conference on Computational Science (ICCS) in Reykjavik, Iceland. This is a collaborative work between University of Notre Dame and University of Chicago for the DASPOS project both of these two universities are working on.
The preservation framework proposed in this paper includes three parts:
Wed, 01 Jul 2015 16:04:00 +0000
Haiyan Meng presented her work on Umbrella, a system for specifying and materializing execution environments in a portable and reproducible way. Umbrella accepts a declarative specification for an application, and then determines the minimum technology needed to deploy it. The application will be run natively if the local execution environment is compatible, but if not, Umbrella will deploy a container, a virtual machine, or make use of a public cloud if necessary.
Charles Zheng presented his work on integrating Docker containers into the Makeflow workflow engine and the Work Queue runtime system, each with different tradeoffs in performance and isolation. These capabilities will be included in the upcoming 5.0 release of CCTools.
High-Energy Physics workloads on 10k non-dedicated opportunistic cores with Lobster. The talk was part of Condor Week 2015, at the University of Wisconsin-Madison.
Lobster is a system for deploying data intensive high-throughput science applications on non-dedicated resources. It is build on top Work Queue, Parrot, and Chirp, which are part of CCTools.
Wed, 27 May 2015 18:47:00 +0000
Haiyan Meng presented A Case Study in Preserving a High Energy Physics Application. This poster describes the complexity of preserving a non-trivial application, the shows how Parrot packaging technology can be used to capture a program's
dependencies, and then re-execute it using a variety of technologies.
Anna Woodard and Matthias Wolf won the best poster presentation award for Exploiting Volatile Opportunistic Computing Resources with Lobster, which was rewarded with a lightning plenary talk. Lobster is an analysis workload management system which has been able to harness 10-20K opportunistic cores at a time for large workloads at Notre Dame, making the facility comparable in size to the dedicated Tier-2 facilities of the WLCG!
Tue, 19 May 2015 14:57:00 +0000
Dr. Peter Sempolinski successfully defended his PhD thesis titled "An Extensible System for Facilitating Collaboration for Structural Engineering Applications"
While at Notre Dame, Peter created a Virtual Wind Tunnel which enabled the crowdsourcing of structural design and evaluation by combining online building design with Google Sketchup and CFD simulation with OpenFoam. The system was used in a variety of contexts, ranging from virtual engineering classes to managing work crowdsourced via Mechanical Turk. his work was recently accepted for publication in IEEE CiSE and PLOS1.
Congratulations to Dr. Sempolinski!
Mon, 04 May 2015 12:59:00 +0000
The CMS physics group at Notre Dame has created Lobster, a data analysis system that runs on O(10K) cores to process data produced by the CMS experiment at the LHC. Lobster uses Work Queue to dispatch tasks to thousands of machines, Parrot with CVMFS to deliver the complex software stack from CERN, XRootD to deliver the LHC data, and Chirp and Hadoop to manage the output data. By using these technologies, Lobster is able to harness arbitrary machines and bring along the CMS computing environment wherever it goes. At peak, Lobster at ND delivers capacity equal to that of a dedicated CMS Tier-2 facility! (read more here) Fri, 01 May 2015 16:41:00 +0000
Dr. Dinesh Rajan successfully defended his PhD thesis titled "Principles for the Design and Operating of Elastic Scientific Applications on Distributed Systems" He is currently an engineer at Amazon Web Services.
While at Notre Dame, hd made significant contributions to the development of Work Queue and worked closely with scientists in biology and molecular dynamics to build highly scalable elastic applications such as the Accelerated Weighted Ensemble. His most recent journal paper in IEEE TCC describes how to design self-tuning cloud applications.
Congratulations to Dr. Rajan!
Fri, 10 Apr 2015 19:30:00 +0000
CCGrid 2015 in China:
Patrick Donnelly, Nicholas Hazekamp, Douglas Thain,Confuga: Scalable Data Intensive Computing for POSIX Workflows, IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, May, 2015.Confuga is a new active storage cluster file system designed for executing regular POSIX workflows. Users may store extremely large datasets on Confuga in a regular file system layout, with whole files replicated across the cluster. You may then operate on your dataset using regular POSIX applications, with defined inputs and outputs.
Confuga handles the details of placing jobs near data and minimizing network load so that the cluster's disk and network resources are used efficiently. Each job executes with all of its input file dependencies local to its execution, within a sandbox.
For those familiar with CCTools, Confuga operates as a cluster of Chirp servers with a single Chirp server operating as the head node. You may use the Chirp library, Chirp CLI toolset, FUSE, or even Parrot to upload and manipulate the data on Confuga.
For running a workflow on Confuga, we encourage you to use Makeflow. Makeflow will submit the jobs to Confuga using the Chirp job protocol and take care of ordering the jobs based on their dependencies.
Fri, 27 Mar 2015 20:46:00 +0000
We have created a new Makeflow visualization module which exports a workflow into an xgmml file compatible with Cytoscape. Cytoscape is a powerful network graphing application with support for custom styles, layouts, annotations, and more. While this program is known more for visualizing molecular networks in biology, it can be used for any purpose, and we believe it is a powerful tool for visualizing makeflow tasks. Our visualization module was designed for and tested on Cytoscape 3.2. The following picture is a Cytoscape visualization of the example makeflow script provided in the User’s Manual (http://ccl.cse.nd.edu/software/manuals/makeflow.html):
To generate a Cytoscape graph from your makeflow script, simply run:
workflow.xgmml can then be opened in Cytoscape through File -> Import -> Network -> File. We have created a clean style named specifically for visualizing makeflow tasks named style.xml, which is generated in the present working directory when you run makeflow_viz. To apply the style in Cytoscape, select File -> Import -> Style, and select the style.xml file. Next, right-click the imported network and select “Apply Style…”. Select “makeflow” from the dropdown menu and our style will be applied. This will add the proper colors, edges, arrows, and shapes for processes and files.
Cytoscape also has a built in layout function which can be used to automatically rearrange nodes according to their hierarchy. To access this, select Layout àSettings, and a new window will pop up. Simply select “Hierarchical Layout” from the dropdown menu, change the settings for that layout to your liking, and select “Execute Layout.” There is a caveat with this function. With larger makeflow tasks, this auto layout function can take long to complete. This is due to Cytoscape being designed for all types of graphs, and they do not appear to implement algorithms specifically for dags to take advantage of faster time complexities. We have tested the auto-layout function with the following test cases:
Tue, 24 Mar 2015 20:28:00 +0000
ForceBalance is an open source software tool for creating accurate force fields for molecular mechanics simulation using flexible combinations of reference data from experimental measurements and theoretical calculations. These force fields are used to simulate the dynamics and physical properties of molecules in chemistry and biochemistry.
The Work Queue framework gives ForceBalance the ability to distribute computationally intensive components of a force field optimization calculation in a highly flexible way. For example, each optimization cycle launched by ForceBalance may require running 50 molecular dynamics simulations, each of which may take 10-20 hours on a high end NVIDIA GPU. While GPU computing resources are available, it is rare to find 50 available GPU nodes on any single supercomputer or HPC cluster. With Work Queue, it is possible to distribute the simulations across several HPC clusters, including the Certainty HPC cluster at Stanford, the Keeneland GPU cluster managed by Georgia Tech and Oak Ridge National Laboratories, and the Stampede supercomputer managed by the University of Texas. This makes it possible to run many simulations in parallel and complete the high level optimization in weeks instead of years.
- Lee-Ping Wang, Stanford University Wed, 10 Dec 2014 15:15:00 +0000
You can download the software here: cctools download
Thanks goes to the contributors and testers for this release: Peter Bui, Patrick Donnelly, Nick Hazekamp, Peter Ivie, Kangkang Li, Haiyan Meng, Peter Sempolinski, Douglas Thain, Ben Tovar, Lee-Ping Wang, Matthias Wolf, Anna Woodard, and Charles Zheng
Enjoy! Thu, 04 Dec 2014 19:37:00 +0000
Lee-Ping Wang at Stanford University, recently published a paper in Nature Chemistry describing his work in fundamental molecular dynamics.
The paper demonstrates the "nanoreactor" technique in which simple molecules are simulated over a long time scale to observe their reaction paths into more complex molecules. For example, the picture below shows 39 Acetylene molecules merging into a variety of hydrocarbons over the course of 500ps simulated time. This technique can be used to computationally predict reaction networks in historical or inaccessible environments, such as the early Earth or the upper atmosphere.
To compute the final reaction network for this figure, the team used the Work Queue framework to harness over 300K node-hours of CPU time on the Blue Waters supercomputer at NCSA.
Mon, 17 Nov 2014 20:18:00 +0000
One such project is a Virtual Wind Tunnel, which was created in collaboration with the Notre Dame Civil Engineering Department, as part of a larger project to explore collaboration in civil design. On the surface, this is a fairly simple idea. A user uploads a building shape for analysis to a web portal. Then, the user can run wind flow simulations upon horizontal cross sections of the building. Once complete, the results of these simulations can be viewed and downloaded.
Making all of this work, however, requires a large number of interlocking components. For now, I would just like to describe how the CCL tools play a role in this system. When simulations are to be run, one very simple way to deliver simulation tasks to available computing resources is to run a Work Queue worker on those machines. The front-end of the system runs a Work Queue master, which queues up tasks.
By using Work Queue as a means of distributing tasks, we can be more flexible about the backend upon which those tasks are run. This allows us to tailor our resource usage to our actual needs and, as needed, to adjust our resource usage when appropriate. Mon, 01 Sep 2014 18:21:00 +0000
This database design is implemented within CCTools in two parts. Part 1 (data storage) has been available for over a year and is called the catalog server. Part 2 (data analysis) has recently been implemented and is not yet in a release, but is available in the following commit:
The data model is designed to handle schema-free status reports from various services. And while the reports can be schema-free, most of the fields will normally remain the same between subsequent reports from the same instance of a service.
The first status report is saved in it's entirety, and then the subsequent reports are saved as changes (or "deltas") on the original report. Snapshots of the status of all services and instances are stored on a daily basis. This allows a query for analysis based on a given time frame to jump more quickly to the start of the time frame, rather than have to start at the very beginning of the life of the catalog server.
A query is performed by applying a series of operators to the data. For a distributed system, spatial distribution is when the data is distributed such that a given instance always ends up on the same node. In this situation, all but the last of the operators can be performed in the map stage of the MapReduce model. This allows for better scalability because less work has to be performed by a single node in the reduce stage.
Much more detail is provided in a paper which was published at IEEE Bigdata 2014, and is available at the following URL:
For further inquiries, please email email@example.com.
Mon, 18 Aug 2014 18:21:00 +0000
CCTools 4.2.0 includes a new feature in Parrot that allows you to automatically observe all of the files used by a given application, and then collect them up into a self-contained package. The package can then be moved to another machine -- even a different variant of Linux -- and then run correctly with all of its dependencies present. The created package does not depend upon Parrot and can be re-run in a variety of ways.
This article explains how to generate a self-contained package and then share it so that others can verify can repeat your applications. The whole process involves three steps: running the original application, creating the self-contained package, and the running the package itself.
Run your program under parrot_run and record the filename list and environment variables by using --name-list and --env-list parameters.
parrot_run --name-list namelist --env-list envlist /bin/bash
After the execution of this command, you can run your program inside parrot_run. At the end of step 1, one file named namelist containing all the accessed file names and one file named envlist containing environment variables will be generated. After everything is done, simple exit the shell.
Step 2: Generate a self-contained package
Use parrot_package_create to generate a package based on the namelist and envlist generated in step 1.
parrot_package_create --name-list namelist --env-path envlist --package-path /tmp/package
This command causes all of the files given in the name list to be copied into the package directory /tmp/package. You may customize the contents of the package by editing the namelist or the package directory by hand.
Step 3: Repeat the program using the package
The newly created package is simply a complete filesystem tree that can be moved to any convenient location. It can be re-run by any method that treats the package as a self-contained root filesystem. This can be done by using Parrot again, by setting up a chroot environment, by setting up a Linux container, or by creating a virtual machine.
To run the package using Parrot, do this:
parrot_package_run --package-path /tmp/package /bin/bash
To run the package using chroot, do this:
chroot_package_run --package-path /tmp/package /bin/bash
In both cases, you will be dropped into a shell in the preserved environment, where all the files used by the original command will be present. You will definitely be able to run the original command -- whether you can run other programs depends upon the quantity of data preserved.
For more information, see these man pages:
Fri, 01 Aug 2014 19:18:00 +0000
The software may be downloaded here: Download CCTools 4.2.0
This release is mostly a bug fix release, but introduces changes to the Work Queue protocol. Thus, workers from 4.2 do not work with masters pre 4.2.
Among the bug fixes and added capabilities are: Among the bug fixes and added capabilities are:
Thu, 31 Jul 2014 12:33:00 +0000
DeltaDB database model for time-varying schema-free data at the IEEE International Congress on Big Data in Anchorage. The DeltaDB concept is what underlies the query engine for the cctools catalog server.
Tue, 03 Jun 2014 16:50:00 +0000