CCL | Software | Download | Manuals | Forum | Papers
CCL Home

Research

Software Community Operations

Recent News in the CCL


CCL at Supercomputing 2017


We are well represented at the annual Supercomputing conference this week:

Tim Shaffer is presenting "Taming Metadata Storms in Parallel Filesystems with MetaFS" at the Parallel Data Storage Workshop (PDSW).  This paper describes a technique for accelerating metadata-intensive program loading workloads that often cause trouble in parallel filesystems.

Kyle Sweeney is presenting "Lightweight Container Integration into Workflow Systems: A Case Study with Singularity and Makeflow" at Workflows in Support of Large Scale Science (WORKS).  This talk explains why integrating containers into workflows isn't as simple as it first appears, and describes a variety of approaches for assembling complete applications from containers and data.

Charles Zheng is presenting "Wharf: Sharing Docker Images across Hosts from a Distributed Filesystem" at the Monday poster session.  This work in progress aims to accelerate the use of containers on HPC clusters by sharing images and metadata within a parallel filesystem.

Mon, 13 Nov 2017 16:58:00 +0000

TPDS Paper: Job Sizing

When submitting jobs for execution to a computing facility, a user must make a critical decision: how many resources (such as cores, memory and disk) should be requested for each job?
Broadly speaking, if the initial job size selected is too small, it is more likely that the job will fail and be returned, thus wasting resources on a failed run that must be retried. On the other hand, if the initial job size selected is too large, the job will succeed on the first try, but waste resources that go unused inside the job's allocation. If the waste is large enough, throughput will be reduced because those resources could have been used to run another job.
If the resources consumed by a collection of jobs were known and constant, then the solution would be easy: run one job at a large size, measure its consumption, and then use that smaller measured size for the remainder of the jobs. However, experience shows that real jobs have non-trivial distributions. For example, the figure shows the histogram of memory consumption for a set of jobs in a high energy physics workflow run on an HTCondor batch system at the University of Notre Dame.
Note that the histogram shows large peaks at approximately 900MB and 1300MB, but there are small number of outliers both above and below those values.
What memory size should we select for this workload? If we pick 3.8GB RAM for all jobs, then every job will succeed, but then most jobs would end up wasting several GB of memory that could be used to run other jobs. On the other hand, we could try a two-step approach, in which each job is run with a smaller value, wait to see which ones succeed or fail, and those that fail are run with the maximum 3.8GB memory allocation.
But precisely what smaller value should be used for the first attempt? The dotted line, at around 1.32GB, turns out to maximize the throughput when running the workflow under this two-step policy. Allowing for %8 of the tasks to be retried, throughput increases 2.54 times, and resources wasted decreased %44.
In our recent paper A Job Sizing Strategy for High-Throughput Scientific Workflows we fully describe the two-step strategy described above. These developments have also been integrated to makeflow and work queue in CCTools. For makeflow, the rules need to be labeled with the optimization mode:


.MAKEFLOW CATEGORY myfirstcategory
.MAKEFLOW MODE MAX_THROUGHPUT

output_1: input_1
cmdline input_1 -o output_1

output_2: input_2
cmdline input_2 -o output_2


.MAKEFLOW CATEGORY myothercategory
.MAKEFLOW MODE MAX_THROUGHPUT

output_3: input_3
cmdline input_3 -o output_3

output_4: input_4
cmdline input_4 -o output_4


Also, makeflow needs to run with the resource monitor enabled, as:
makeflow --monitor=my_resource_summaries_dir (... other options ...)
Rules in the same category will be optimized together.
Similarly, for work queue:


q = WorkQueue(...)
q.enable_monitoring()

q.specify_category_mode('myfirstcategory', WORK_QUEUE_ALLOCATION_MODE_MAX_THROUGHPUT)

t = Task(...)
t.specify_category('myfirstcategory')

Additionally, we have made available a pure python implementation at:
https://github.com/cooperative-computing-lab/efficient-resource-allocations Thu, 26 Oct 2017 15:20:00 +0000

Makeflow Feature: JX Representation

There are a number of neat new features in the latest versions of our software that I would like to highlight through some occasional blog posts.  If these sound interesting, please give them a try and send us your feedback.

First, I would like to highlight recent work by Tim Shaffer on JX, a new encoding for Makeflow that makes it easier to express complex workflows programmatically.  

For example, a traditional makeflow rule looks like this:

out.txt: in.txt calib.dat simulate.exe
    simulate.exe -i in.txt -p 10 > out.txt

In the latest version of Makeflow, you can write the same rule in JSON like this:

{
    "command" : "simulate.exe -i in.txt -p 10 > out.txt",
    "inputs" : [ "in.txt", "calib.dat", "simulate.exe" ],
    "outputs": [ "out.txt" ]
}

Now, just using JSON by itself doesn't give you a whole lot.  However, we extended JSON with a few new features like list comprehensions, variables substitutions, and operators.  This gives us a programmable way of generating a lot of rules easily.

For example, this represents 100 rules where the parameter varies from 0-99:

{
   "command" : format("simulate.exe -i in.txt -p %d > out.%d.txt",param,param),
   "inputs" : [ "in.txt", "calib.dat", "simulate.exe" ],
   "outputs": [ format("out.%d.txt",param) ]

} for param in range(100)

For a more detailed example, see these example BWA workflows expressed in three different ways:
Thanks to Andrew Litteken for converting and testing many of our example workflows into the new format.
Wed, 18 Oct 2017 18:47:00 +0000

Announcement: CCTools 6.2.0 released

The Cooperative Computing Lab is pleased to announce the release of version 6.2.0 of the Cooperative Computing Tools including Parrot, Chirp, Makeflow, WorkQueue, Umbrella, Prune, SAND, All-Pairs, Weaver, and other software.

The software may be downloaded here:
http://ccl.cse.nd.edu/software/download

This is a major which adds several features and bug fixes. Among them:

  • [JX] A superset of JSON to dynamically describe workflows, see doc/jx.html. (Tim Shaffer)
  • [Makeflow] Support for Amazon EC2. (Kyle Sweeney, Douglas Thain)
  • [Makeflow]  Singularity support bug fixes. (Kyle Sweeney)
  • [Parrot] Fix CVMFS initialization. (Tim Shaffer)
  • [Prune] Several bug fixes. (Peter Ivie)
  • [ResourceMonitor] Measurement snapshots by observing log files, --snapshot-events. (Ben Tovar)
  • [WorkQueue] Compressed updates to the catalog server. (Nick Hazekamp, Douglas Thain)
  • [WorkQueue] work_queue_factory uses computed maximum worker capacity of the master. (Nate Kremer-Herman)
  • [WorkQueue] Several bug fixes. (Nick Hazekamp, Ben Tovar)
  • [WQ_Maker] Several bug fixes. (Nick Hazekamp)

Thanks goes to the contributors for many features, bug fixes, and tests:

  • Jakob Blomer
  • Nathaniel Kremer-Herman
  • Nicholas Hazekamp
  • Peter Ivie
  • Tim Shaffer
  • Douglas Thain
  • Ben Tovar
  • Kyle Sweeney
  • Chao Zheng

Please send any feedback to the CCTools discussion mailing list:

http://ccl.cse.nd.edu/community/forum

Enjoy!

Mon, 09 Oct 2017 15:39:00 +0000

2017 DISC Summer REU Conclusion

This summer, we hosted 9 outstanding undergraduate students in our summer REU program in Data Intensive Scientific Computing (DISC).  Our guests spent the summer working with faculty in labs across campus in fields such as astronomy, high energy physics, bioinformatics, data visualization, and distributed systems.  And, they enjoyed some summer fun around South Bend.

Check out these short YouTube clips that explain each research project:



And here they are presenting at our summer research symposium: 


If you would like to participate, please apply for the 2018 edition of the DISC REU program at Notre Dame.

Wed, 30 Aug 2017 17:58:00 +0000

Announcement: CCTools 6.1.6 released

The Cooperative Computing Lab is pleased to announce the release of version 6.1.6 of the Cooperative Computing Tools including Parrot, Chirp, Makeflow, WorkQueue, Umbrella, Prune, SAND, All-Pairs, Weaver, and other software.

The software may be downloaded here:
http://ccl.cse.nd.edu/software/download

This is a minor release which adds some bug fixes. Among them:

  • [General] Fix bug configuring perl paths. (Ben Tovar)
  • [General] Fix bug on JX inline querying strings. (Tim Shaffer)
  • [Makeflow] Fix bug when waiting on a local process. (Douglas Thain)
  • [Makeflow] Enforce local resources limits. (Douglas Thain)
  • [Makeflow] Save failed outputs to aid debugging. (Tim Shaffer)
  • [Makeflow] Fix bug when parsing of some command lines. (Ben Tovar)
  • [WorkQueue] Transactions log adds workers disconnection reason. (Ben Tovar)
  • [WorkQueue] Fix bug when checking version of gnuplot in work_queue_graph_log. (Ben Tovar)
  • [WQmaker] Support dynamic environments. (Nicholas Hazekamp)

Thanks goes to the contributors for many features, bug fixes, and tests:

  • Nathaniel Kremer-Herman
  • Nicholas Hazekamp
  • Peter Ivie
  • Tim Shaffer
  • Douglas Thain
  • Ben Tovar
  • Kyle Sweeney
  • Chao Zheng

Please send any feedback to the CCTools discussion mailing list:

http://ccl.cse.nd.edu/community/forum

Enjoy!
Tue, 29 Aug 2017 17:53:00 +0000

Talk at ScienceCloud Workshop

Prof. Thain gave the opening talk, "Seamless Scientific Computing from Laptops to Clouds", at the ScienceCloud workshop preceding High Performance Distributed Computing 2017 in Washington, DC.  This talk gives an overview of the problem of migrating scientific codes from the comfortable environment of a laptop to the complex environment of a cluster or a cloud, highlighting our new tools for software deployment and resource management for bioinformatics and high energy physics applications.

Tue, 27 Jun 2017 15:29:00 +0000

Congrads to Ph.D Graduates

Congratulations to all of our 2017 Ph.D. graduates in Computer Science and Engineering,
and especially to Dr. Haiyan Meng who is moving on to a position at Google, Inc.

Mon, 22 May 2017 13:37:00 +0000

Announcement: CCTools 6.1.0. released

The Cooperative Computing Lab is pleased to announce the release of version 6.1.0 of the Cooperative Computing Tools including Parrot, Chirp, Makeflow, WorkQueue, Umbrella, Prune, SAND, All-Pairs, Weaver, and other software.

The software may be downloaded here:
http://ccl.cse.nd.edu/software/download

This is a major which adds several features and bug fixes. Among them:

  • [General]  IPv4 and IPv6 mode handling fixes. (Tim Shaffer)
  • [Grow]  Fuse module for GROW-FS. (Tim Shaffer)
  • [Makeflow]  Updated manual. (Douglas Thain)
  • [Makeflow]  Support for Mesos. (Charles Zheng)
  • [Makeflow]  Support for Singularity. (Kyle Sweeney)
  • [Makeflow]  --shared-fs option. (Nick Hazekamp)
  • [Makeflow]  --preserve option for per rule caching. (Pierce Cunneen)
  • [Parrot]  Fix ld.so and exec bugs. (Tim Shaffer)
  • [Parrot]  Fix handling of namespace symlinks. (Tim Shaffer)
  • [Parrot]  Bind/connect on AF_UNIX sockets. (Tim Shaffer)
  • [Parrot]  Several bug fixes. (Tim Shaffer)
  • [Parrot]  parrot_namespace to use parrot inside parrot. (Tim Shaffer)
  • [Parrot]  Add fixed and warped PIDs. (Douglas Thain)
  • [Prune]     Several bug fixes. (Peter Ivie)
  • [ResourceMonitor]  Resource per task snapshots, --snapshot-file. (Ben Tovar)
  • [Umbrella]  Several bug fixes. (Haiyan Meng)
  • [WorkQueue]  Resource per task snapshots, q.enable_monitoring_snapshots. (Ben Tovar)
  • [WorkQueue]  Several bug fixes. (Ben Tovar)
  • [WorkQueue]  Custom environments for wq factory. (Kyle Sweeney)
  • [WorkQueue]  Several fixes to wq factory. (Nate Kremer-Herman)
  • [WQ_Maker]   Several bug fixes, update to latest version of maker. (Nick Hazekamp)

Thanks goes to the contributors for many features, bug fixes, and tests:

  • Jakob Blomer
  • Pierce Cunneen
  • Patrick Donnelly
  • Nathaniel Kremer-Herman
  • Nicholas Hazekamp
  • Peter Ivie
  • Haiyan Meng
  • Tim Shaffer
  • Douglas Thain
  • Ben Tovar
  • Kyle Sweeney
  • Chao Zheng

Please send any feedback to the CCTools discussion mailing list:

http://ccl.cse.nd.edu/community/forum

Enjoy! Wed, 17 May 2017 16:55:00 +0000

Makeflow and Mesos Paper at CCGrid 2017

Charles Zheng will present the paper Deploying High Throughput Scientific Workflows on Container Schedulers with Makeflow and Mesos at the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2017) in May 15, 2017 at Madrid, Spain. In this paper we consider how to launch workflow system on container schedulers with minimal performance loss and higher system efficiency. As examples of current technology, we use Makeflow,  Work Queue, Resource Monitor and Mesos. We observe that using Work Queue and Resource Monitor not only reduces the task turnaround time but also achieves higher resource usage rate. Following is the system architecture.

Fri, 05 May 2017 17:28:00 +0000

Workflow Reproducibility Paper at ICCS 2017

Haiyan Meng will present a paper titled Facilitating the Reproducibility of Scientific Workflows with Execution Environment Specifications at the International Conference on Computational Science (ICCS) 2017 this June at Zurich, Switzerland. This paper explores the challenges in reproducing scientific workflows, and proposes a framework for facilitating the reproducibility of scientific workflows at the task level by giving scientists complete control over the execution environments of the tasks in their workflows and integrating execution environment specifications into scientific workflow systems.



Mon, 01 May 2017 17:58:00 +0000

Ph.D. Defense: Haiyan Meng

Haiyan Meng successfully defended her dissertation titled "Improving the Reproducibility of Scientific Applications with Execution Environment Specifications"  Congratulations!


Wed, 22 Mar 2017 18:04:00 +0000

Makeflow Examples Archive

We recently updated our archive of example Makeflows so that they are significantly easier to download, execute, and reshape to various sizes.   For each one, we have instructions on how to obtain the underlying binary program, generate some sample data, and then create a workload of arbitrary size.  This allows you to experiment with Makeflow at small scale, and then dial things up when you are ready run on on thousands of nodes:

https://github.com/cooperative-computing-lab/makeflow-examples


Thu, 09 Mar 2017 18:32:00 +0000

Big CMS Data Analysis at Notre Dame

Analyzing the data produced by the Compact Muon Solenoid (CMS), one of the experiments at the Large Hadron Collider, requires a collaboration of physicists, computer scientists to harness hundreds of thousands of computers at universities and research labs around the world.  The contribution of each site to the global effort, whether small or large, is reported out on a regular basis.

This recent graph tells an interesting story about contributions to CMS computing in late 2016.  Each color in the bargraph represents the core-hours provided by a given site over the course of a week:

The various computing sites are divided into tiers:

  • Tier 0 is CERN, which is responsible for providing data to the lower tiers.
  • Tier 1 contains the national research labs like Fermi National Lab (FNAL), Rutherford Appleton Lab in in UK, and so forth, that facilitate analysis work for universities in their countries.
  • Tier 2 contains universities like Wisconsin, Purdue, and MIT, that have significant shared computing facilities dedicated to CMS data analysis.
  • Tier 3 is everyone else performing custom data analysis, sometimes on private clusters, and sometimes on borrowed resources. Most of those are so small that they are compressed into black at the bottom of the graph.

Now, you would think that the big national sites would produce most of the cycles, but there are a few interesting exceptions at the top of the list.

First, there are several big bursts in dark green that represent the contribution of the HEPCloud prototype, which is technically a Tier-3 operation, but is experimenting with consuming cycles from Google and Amazon.  This has been successful at big bursts of computation, and the next question is whether this will be cost-effective over the long term.

Next, the Tier-2 at the University of Wisconsin consistently produces a huge number of cycles from their dedicated facility and opportunistic resources from the Center for High Throughput Computing.  This group works closely with the HTCondor team at Wisconsin to make sure every cycle gets used, 365 days a year.

Following that, you have the big computing centers at CERN and FNAL, which is no surprise.

And, then the next contributor is our own little Tier-3 at Notre Dame, which frequently produces more cycles than most of the Tier-2s and some of the Tier-1s!  The CMS group at ND harnesses a small dedicated cluster, and then adds to that unused cycles from our campus Center for Research Computing by using Lobster and the CCL Work Queue software on top of HTCondor.

The upshot is, on a good day, a single grad student from Notre Dame can perform data analysis at a scale that rivals our national computing centers!


Fri, 17 Feb 2017 14:37:00 +0000

IceCube Flies with Parrot and CVMFS


IceCube is a neutrino detector built at the South Pole by instrumenting about a cubic kilometer of ice with 5160 light sensors. The IceCube data is analyzed by a collaboration of about 300 scientists from 12 countries. Data analysis relies on the precise knowledge of detector characteristics, which are evaluated by vast amounts of Monte Carlo simulation.  On any given day, 1000-5000 jobs are continuously running.

Recently, the experiment began using Parrot to get their code running on GPU clusters at XSEDE sites (Comet, Bridges, and xStream) and the Open Science Grid.  IceCube relies on software distribution via CVMFS, but not all execution sites provide the necessary FUSE modules.  By using Parrot, jobs can attach to remote software repositories without requiring special privileges or kernel modules.

- Courtesy of Gonzalo Merino, University of Wisconsin - Madison Thu, 02 Feb 2017 21:25:00 +0000

Reproducibility Papers at eScience 2016

CCL students presented two papers at the IEEE 12th International Conference on eScience on the theme of reproducibility in computational science:
Congrads to Haiyan for winning a Best of Conference award for her paper:
    Tue, 25 Oct 2016 13:47:00 +0000

    CCL Workshop 2016

    The 2016 CCL Workshop on Scalable Scientific Computing was held on October 19-20 at the University of Notre Dame.  We offered tutorials on Makeflow, Work Queue, and Parrot. and gave highlights of the many new capabilities relating to reproducibility and container technologies. Our user community gave presentations describing how these technologies are used to accelerate discovery in genomics, high energy physics, molecular dynamics, and more. 
    Everyone got together to share a meal, solve problems, and generate new ideas. Thanks to everyone who participated, and see you next year

    Tue, 25 Oct 2016 13:26:00 +0000

    NSF Grant to Support CCTools Development

    We are pleased to announce that our work will continue to be supported by the National Science Foundation through the division of Advanced Cyber Infrastructure.

    The project is titled "SI2-SSE: Scaling up Science on Cyberinfrastructure with the Cooperative Computing Tools"  It will advance the development of the Cooperative Computing Tools to meet the changing technology landscape in three key respects: exploiting container technologies, making efficient use of local concurrency, and performing capacity management at the workflow scale.  We will continue to focus on active user communities in high energy physics, which rely on Parrot for global scale filesystem access in campus clusters and the Open Science Grid; bioinformatics users executing complex workflows via the VectorBase, LifeMapper, and CyVerse disciplinary portals, and ensemble molecular dynamics applications that harness GPUs from XSEDE and commercial clouds. 

    Tue, 20 Sep 2016 15:11:00 +0000

    Announcement: CCTools 6.0.0. released

    The Cooperative Computing Lab is pleased to announce the release of version 6.0.0 of the Cooperative Computing Tools including Parrot, Chirp, Makeflow, WorkQueue, Umbrella, Prune, SAND, All-Pairs, Weaver, and other software.

    The software may be downloaded here:
    http://ccl.cse.nd.edu/software/download

    This is a major which adds several features and bug fixes. Among them:

    • [Catalog]   Automatic fallback to a backup catalog server. (Tim Shaffer)
    • [Makeflow]  Accept DAGs in JSON format. (Tim Shaffer)
    • [Makeflow]  Multiple documentation omission bugs. (Nick Hazekamp and Haiyan Meng)
    • [Makeflow]  Send information to catalog server. (Kyle Sweeney)
    • [Makeflow]  Syntax directives (e.g. .SIZE for to indicate file size). (Nick Hazekamp)
    • [Parrot] Fix cvmfs logging redirection. (Jakob Blomer)
    • [Parrot] Multiple bug-fixes. (Tim Shaffer, Patrick Donnelly, Douglas Thain)
    • [Parrot] Timewarp mode for reproducible runs. (Douglas Thain)
    • [Parrot] Use new libcvmfs interfaces if available. (Jakob Blomer)
    • [Prune]     Use SQLite as backend. (Peter Ivie)
    • [Resource Monitor] Record the time where a resource peak occurs. (Ben Tovar)
    • [Resource Monitor] Report the peak number of cores used. (Ben Tovar)
    • [Work Queue] Add a transactions log. (Ben Tovar)
    • [Work Queue] Automatic resource labeling and monitoring. (Ben Tovar)
    • [Work Queue] Better capacity worker autoregulation. (Ben Tovar)
    • [Work Queue] Creation of disk allocation per tasks. (Nate Herman-Kremer)
    • [Work Queue] Extensive updates to wq_maker. (Nick Hazekamp)
    • [Work Queue] Improvements in computing master's task capacity. (Nate Herman-Kremer).
    • [Work Queue] Raspberry Pi compilation fixes. (Peter Bui)
    • [Work Queue] Throttle work_queue_factory with --workers-per-cycle. (Ben Tovar)
    • [Work Queue] Unlabeled tasks are assumed to consume 1 core, 512 MB RAM and 512 MB disk. (Ben Tovar)
    • [Work Queue] Worker disconnects when node does not longer have the resources promised. (Ben Tovar)
    • [Work Queue] work queue statistics clean up (see work_queue.h for deprecated names). (Ben Tovar)
    • [Work Queue] work_queue_status respects terminal column settings. (Mathias Wolf)

    We will have tutorials on the new features in our upcoming workshop, October 19 and 20. Refer to http://ccl.cse.nd.edu/workshop/2016 for more information. We hope you can join us!

    Thanks goes to the contributors for many features, bug fixes, and tests:

    • Jakob Blomer
    • Peter Bui
    • Patrick Donnelly
    • Nathaniel Kremer-Herman
    • Kenyi Hurtado-Anampa
    • Peter Ivie
    • Kevin Lannon
    • Haiyan Meng
    • Tim Shaffer
    • Douglas Thain
    • Ben Tovar
    • Kyle Sweeney
    • Mathias Wolf
    • Anna Woodard
    • Chao Zheng

    Please send any feedback to the CCTools discussion mailing list:

    http://ccl.cse.nd.edu/community/forum

    Enjoy! Thu, 15 Sep 2016 17:55:00 +0000

    Summer REU Projects in Data Intensive Scientific Computing

    We recently wrapped up the first edition of the summer REU in Data Intensive Scientific Computing at the University of Notre Dame.  Ten undergraduate students came to ND from around the country and worked on projects encompassing physics, astronomy, bioinformatics, network sciences, molecular dynamics, and data visualization with faculty at Notre Dame.

    To learn more, see these videos and posters produced by the students:

           


    Fri, 19 Aug 2016 13:24:00 +0000

    Simulation of HP24stab with AWE and Work Queue


    The villin headpiece subdomain "HP24stab" is a recently discovered 24-residue stable supersecondary structure that consists of two helices joined by a turn. Simulating 1μs of motion for HP24stab can take days or weeks depending on the available hardware, and folding events take place on a scale of hundreds of nanoseconds to microseconds.  Using the Accelerated Weighted Ensemble (AWE), a total of 19us of trajectory data were simulated over the course of two months using the OpenMM simulation package. These trajectories were then clustered and sampled to create an AWE system of 1000 states and 10 models per state. A Work Queue master dispatched 10,000 simulations to a peak of 1000 connected 4-core workers, for a total of 250ns of concurrent simulation time and 2.5μs per AWE iteration. As of August 8, 2016, the system has run continuously for 18 days and completed 71 iterations, for a total of 177.5μs of simulation time. The data gathered from these simulations will be used to report the mean first passage time, or average time to fold, for HP24stab, as well as the major folding pathways.  - Jeff Kinnison and Jesús Izaguirre, University of Notre Dame
    Wed, 10 Aug 2016 20:58:00 +0000

    ND Leads DOE Grant on Virtual Clusters for Scientific Computing

    Prof. Douglas Thain is leading a new $2.2M DOE-funded project titled "VC3: Virtual Clusters for Community Computation" in an effort to make our national supercomputing facilities more effective for collaborative scientific computing.  The project team brings together researchers from the University of Notre Dame, the University of Chicago, and Brookhaven National Lab.



    Our current NSF and DOE supercomputers are very powerful, but they each have different operating systems and software configurations, which makes it difficult and time consuming for new users to deploy their codes and share results.  The new service will create virtual clusters on the existing machines that have the custom software and other services needed to easily run advanced scientific codes from fields such as high energy physics, bioinformatics, and astrophysics.  If successful, users of this service will be able to easily move applications between university and national supercomputing facilities.





    Wed, 10 Aug 2016 20:30:00 +0000

    2016 DISC Summer Session Wraps Up

    Congratulations to our first class of summer students participating in the Data Intensive Scientific Computing research experience!  Twelve students from around the country came to Notre Dame to learn how computing drives research in high energy physics, climatology, bioinformatics, astrophysics, and molecular dynamics.

    At our closing poster session in Jordan hall (along with several other REU programs) students presented their work and results to faculty and guests across campus.

    If you are excited to work at the intersection of scientific research and advanced computing, we invite you to apply to the 2017 DISC summer program at Notre Dame! Fri, 29 Jul 2016 18:00:00 +0000

    New Work Queue Visualization

    Nate Kremer-Herman has created a new, convenient way to lookup information of Work Queue masters. This new visualization tool provides real-time updates on the status of each Work Queue master that contacts our catalog server. We hope that this new tool will serve to both facilitate our users' understanding of what their Work Queue masters are doing and assist the user in determining when it may be time to take corrective action.

    A Comparative View


    A Specific Master


    In part, this tool provides our users with measurements on their tasks currently running, the number of tasks waiting to be run, and the total capacity of tasks that could be running. As an example, a user could find that they have a large number of tasks waiting, a small number of tasks running, and a task capacity that is somewhere in between. A recommendation we could make to a user who is seeing something like this would be to ask for more workers. Our hope is that users will take advantage of this new way to view and manage their work.

    Wed, 22 Jun 2016 18:57:00 +0000

    Work Queue from Raspberry Pi to Azure at SPU

    "At Seattle Pacific University we have used Work Queue in the CSC/CPE 4760 Advanced Computer Architecture course in Spring 2014 and Spring 2016.  Work Queue serves as our primary example of a distributed system in our “Distributed and Cloud Computing” unit for the course.  Work Queue was chosen because it is easy to deploy, and undergraduate students can quickly get started working on projects that harness the power of distributed resources."

    The main project in this unit had the students obtain benchmark results for three systems: a high performance workstation; a cluster of 12 Raspberry Pi 2 boards, and a cluster of A1 instances in Microsoft Azure.  The task for each benchmark used Dr. Peter Bui’s Work Queue MapReduce framework; the students tested both a Word Count and Inverted Index on the Linux kernel source. In testing the three systems the students were exposed to the principles of distributed computing and the MapReduce model as they investigated tradeoffs in price, performance, and overhead.
     - Prof. Aaron Dingler, Seattle Pacific University. 

    Thu, 26 May 2016 12:52:00 +0000