(Of course, remote job execution is potentially a security concern, so job execution is disabled by default. To enable job execution, you must run your Chirp server with the -X command line option. While running, the job executes in an identity box that limits file access to that specified by the ACLs.)
This document describes how to execute jobs from the command line, and how to write programs that execute jobs.
% chirp server.somewhere.edu connected to server as unix:fred chirp> cd mydata chirp> put haystack.txt chirp> put /usr/bin/grep chirp> setacl /mydata unix:fred rwldax chirp> job_run grep needle haystack.txtYou will then see the state of the job as it progresses:
jobid 6 created jobid 6 submitted. jobid 6 completed with exit code 0 jobid 6 removed.By default, job_run will place the standard output and error into files named stdout.txt and stderr.txt and retrieve them when the job is complete. The input and output can be directed into other files using the > and < symbols.
Note that job_run displays a job id number. If you stop the chirp tool with Control-C, the job will still be running. You can return to the server and using the job number, examine its status with job_list, wait for it to complete with job_wait, or kill it with job_kill. Regardless of how the job completes, a record of its completion is left behind until you invoke job_remove
The client creates a job by calling chirp_reli_job_begin, specifying the program to be run. This causes the server to create a new job in the INITAL state and return its jobid to the client. Next, the client must invoke chirp_reli_job_commit, which puts the job into the IDLE state, allowing it to run. The server may have multiple jobs queued at any given time, and has some internal algorithm ( currently FCFS) to decide which to run. When a job runs, the server moves it to the RUNNING state. If the job runs to completion by either exiting normally or crashing due to a signal, it reaches the COMPLETED state. If the job cannot be executed at all (e.g. the program specified is not an executable binary) the job reaches the FAILED state. If the server should crash and restart, the job will be placed in the IDLE state, and will have the opportunity to run again. The owner of the job (or the server super-user) may issue chirp_reli_job_kill, which will cause a job in the INITIAL, IDLE, or RUNNING state to be forcibly terminated and moved to the KILLED state.
The chirp_job_wait function is used to make the caller wait until either the job reaches one of the three terminal states (COMPLETE, FAILED, KILLED), the timeout parameter expires, or the server decides to stop return prematurely. A timeout of zero can be used to immediately return the job's status. Regardless of which condition is reached, chirp_job_wait will fill in a chirp_job_state structure with the current state of the job:
struct chirp_job_state { INT64_T jobid; char command[CHIRP_PATH_MAX]; char owner[CHIRP_PATH_MAX]; int state; int exit_code; time_t submit_time; time_t start_time; time_t stop_time; int pid; };Note that the return code of chirp_job_wait only indicates whether job status was successfully returned. A return value >=0 indicates the job state was retrieved, and a return value <0 indicates the job state was not retrieved. The caller MUST look at the state field of the structure to determine whether the job has completed or not.
Unlike Unix, the state of a complete job remains available on the server and can be viewed multiple times with chirp_job_wait. This allows for the possibility of communication errors without resulting in an inconsistency. The caller must explicitly remove the state with chirp_job_remove. However, the server retains the freedom to remove the state after an excessive amount of time (currently one week) has passed since the job completed. [an error occurred while processing this directive]