Data Intensive Science Cluster - Cooperative Computing Lab

CCL Home

Software

Community

Operations

DISC - Data Intensive Science Cluster

The DISC is a shared computing facility managed by the Cooperative Computing Lab and the Center for Research Computing at the University of Notre Dame. The facility provides unique capabilities for rapidly exploring, processing, and visualizing multi-terabyte datasets, in support of research groups in biology and bioinformatics, biometrics and computer vision, molecular dynamics, systems biology, and computer systems research.

User Interfaces

The following interfaces are currently available for using the DISC:

The BXGrid web portal provides access to biometrics research data stored on the DISC.

The Condor batch system provides access to the computing cycles available on the cluster.

The Hadoop data processing system provides the ability to run Map-Reduce jobs on the cluster.

The Chirp distributed filesystem presents the cluster as one big 180TB storage device visible at disc01.crc.nd.edu:9090.

Policy

The DISC cluster was acquired via a Notre Dame Equipment Replacement and Renewal grant in early 2011. The five parties to the grant will have first priority to the resources available on the cluster, in approximately equal proportion:

Computer Vision Research Lab (Patrick Flynn (CSE) and Kevin Bowyer (CSE))

Bioinformatics and Biology (Scott Emrich (CSE), Jeanne Romero-Severson (BIOS), Frank Collins (BIOS), Nora Besansky (BIOS), Patricia Clark (Chem/Biochem), Michael Pfrender (BIOS))

Laboratory for Computational and Life Sciences (Jesus Izaguirre (CSE) and Chris Sweet (CRC)

Cyberinfrastructure Lab (Greg Madey (CSE))

The Cooperative Computing Lab (Douglas Thain CSE))

Other parties on campus are welcome to make use of the DISC by submitting Condor jobs or by accessing data in Hadoop. However, such use will have lower priority and may be interrupted if needed to service the primary parties. Users should note that the cluster is primarily for the analysis and processing of large data sets. While data in active use may stay resident on the cluster for some time, it is not meant to be a backup system, nor is it guaranteed to be highly reliable, so valuable data should be backed up, and cold data should be stored elsewhere.

Hardware

The DISC contains 26 nodes, consisting of:

32GB RAM

12 x 2TB SATA disks.

2 x 8-core Intel Xeon E5620 CPUs @ 2.40GHz

Gigabit Ethernet

The disks on each node are operated individually, and are currently configured as follows:

	Purpose	Mount Point
Disk 1	Operating System	/
Disk 2	Condor	/var/condor
Disk 3	Chirp - General	/data/chirp
Disk 4	Chirp - Biocompute	/data/chirp/biocompute
Disk 5	Chirp - Biometrics	/data/chirp/bxgrid
Disk 6	Hadoop	/data/hadoop/volume1
Disk 7	Hadoop	/data/hadoop/volume2
Disk 8	Hadoop	/data/hadoop/volume3
Disk 9	Hadoop	/data/hadoop/volume4
Disk 10	Unassigned	/data/scratch1
Disk 11	Unassigned	/data/scratch2
Disk 12	Unassigned	/data/scratch3

Both AFS and CRC /pscratch are mounted on all nodes of the cluster, to facilitate data transfer between systems.