This webpage records our work on data and software preservation from Spring 2014 to Spring 2015. Our work is supported by DASPOS. For more information about our lab, please check The Cooperative Computing Lab.
The Parrot Packaging Tool is based on our work from Fall 2013 to Spring 2014 (here).The key changes are as follows:
A) We merge the file dependency tracking and the environment variables tracking into one single step by adding --env-list option to parrot_run.
B) The package generation tool, called parrot_package_create, is reimplementated from scratch in C, with optimization on the categories of system calls to speed up the packaging procedure. The old version is implementated in bash.
All the file-relevant syscalls calls can be divided into two categories: special_syscall and others. special_syscall includes: "open_object", "bind32", "connect32", "bind64", "connect64", "truncate", "link1", "mkalloc", "lsalloc", "whoami", "md5", "copyfile1", "copyfile2"; As for special_syscall, the copy degree of the involved file will be fullcopy; the copy degree of files involved in other syscalls will be metadatacopy. The following syscalls were fullcopy before in the shell-version implementation, but now become metadatacopy: "lstat", "stat", "follow_symlink", "link2", "symlink2", "readlink", "unlink".
Note: this change reduces the size of the package for the Tauroast application from 21GB to 18GB, and reduces the time consumption of the packaging procedure for TauRoast from about 90 minutes to about 30 minutes.
C) We design a new utility, called parrot_package_run, to help make it easier to repeat an application within a package. For the root user, an utility called chroot_package_run can be used to repeat an application within a jail constructed from a preserved package.
Our work illustrates that by combining light-weight virtualization techniques with software delivery mechanisms, complex applications, such as HEP applications, can be captured, invariantly preserved, and practically distributed and re-used.
Within the ongoing investigation of the Higgs boson at the CMS detector, part of the LHC at CERN, the Higgs production in association with two top quarks allows measuring the Higgs coupling strength to top quarks. As the Higgs boson is too short-lived to be detected itself, it has to be reconstructed from its decay products.
TauRoast searches for cases where the Higgs boson decays to two tau leptons. Since the tau leptons are very short-lived, they are not observed directly, but by the particle decay products that they generate. So, the analysis must search for detector events that show a signature of decay products compatible with both hadronic tau and top decays. Properties of such events are used to distinguish the events of interest (Higgs decays) from all other events and are also used in further statistical analysis.
More information of the code and data sources of tauroast, please check here.
Hardware Architecture: X86_64; Kernel: Linux 2.6.32; OS: RedHat 6.6
CPU Cores: 64; Memory Space: 125GB; Disk Space: 204GB
If you are using old branch of the source code, your branch should be later than 3214e873132007762fbedc8d8d2998d7b63054d8.
If you are using the binary version, you should guarantee your cctools version is >= 4.2.
To figure out the underlying file dependencies and execution environment, Parrot allows you to record the names of all the accessed files during the execution process of one program, which is implemented as the --name-list dependencylist option, and allows you to record the environment variables of your program, which is implemented as the --env-list envlist option. When one filename is resolved by the Parrot name resolver, it is also recorded into the dependencylist file. The system call type of a file is also transferred to the name resolver and recorded into the dependencylist file. The command used to generate the dependency list and environment list of the TauRoast application is as follows:
% parrot_run --name-list namelist --env-list envlist /bin/tcsh ~/script-v4.csh
The source code of script-v4.csh is here.
After executing this command, all the accessed file names will be recorded into the file called namelist, and all the environment variables will be recorded into the file called envlist. The format of namelist is filename|system-call-type, such as usr/bin/ls|stat, which means the file /usr/bin/ls is accessed using the stat system call. The format of each item in envlist is: <name>=<value>
For more information of parrot_run, please check here.
After recording the accessed files of one program with the help of the --name-list parameter of parrot_run and the environment variables with the help of the --env-list parameter of parrot_run, parrot_package_create can generate a package containing all the accessed files and the environment variables. parrot_package_create shares the same --name-list and --env-list parameters with parrot_run. --package-path parameter is used to specify the location of package.
% parrot_package_create --name-list namelist --env-list envlist --package-path /tmp/package
After executing this command, one package with the path of /tmp/package will be generated. The total size of the directory is about 18GB.
For more information of parrot_package_create, please check here.
Currently, there are two ways to ship one package: First, store the package into a website, and share the download url with others. Second, transfer the package into a Docker image, and then push the image into Docker Hub, and share the image id and name with others.
Note: Before Distributing the package to a new place on the same machine or on a different machine, first create a tar file. Do not directly copy the whole directoy.
According to the distribution method mentioned above, there are two methods to repeat one application within the package. In the case where the tarball of the package is distributed, you need to first download the tarball and uncompress it to a directory like /tmp/package. Then you can repeat the application using the following command:
% parrot_package_run --package-path /tmp/package /bin/tcsh ~/script-v4.csh
For more information of parrot_package_run, please check here.
If you have the root account, you can also transfer the package into a jail and run the application within it. We also provide a tool called chroot_package_run to do this:
% chroot_package_run --package-path /tmp/package /bin/tcsh ~/script-v4.csh
For more information of chroot_package_run, please check here.
To repeat the application through Docker, please check the Documentation of Docker: here