Chirp Protocol Version 2

Last edited: 23 June 2008

Introduction

Chirp is a simple and lightweight remote I/O protocol with multiple implementations. It is used in several distributed computing systems where users wish to access data over the wide area using a Unix filesystem like protocol. This document lays out the underlying protocol so as to promote interoperability between implementations, and permit additional implementations by other parties. At the time of writing, there are four independent implementations used for both research and production:

  • A standalone C server and client maintained by Douglas Thain at the University of Notre Dame.
  • A Java client implementation by Justin Wozniak used by the GEMS project at the University of Notre Dame.
  • A C server and Java client embedded in the Condor distributed processing system at the University of Wisconsin.
  • A C client implementation that is part of the ROOT I/O library used in the high energy physics community.
  • Authentication Protocol

    Chirp is carried over a stream protocol such as TCP or Unix domain sockets. A server waits for a client to connect by the appropriate mechanism, and then waits for the client to identify itself. There are currently two methods of authentication: cookie authentication and negotiated authentication.

    Cookie Authentication

    In certain situations, the client may be provided in advance with a "cookie" that provides expedited access to a server with a fixed identity. This method is typically used within Condor when communicating with the embedded Chirp proxy.

    In this situation, the client is provided with a file named chirp.config which contains the host and port of the server, and a cookie string. The client must connect to the server and immediately send cookie, a space, the cookie string, and a newline. If the response from the server is 0\n then the cookie is accepted, and the client may proceed to the main Chirp protocol described below. Any other response indicates the cookie was rejected, and the client must abort.

    Negotiated Authentication

    In the general case, the client must prove its identity to the server. There are several possible methods of authentication, and both client and server may only accept some subset of them. The client drives the negotiation process by simply sending the requested authentication type as a plain string followed by a newline. If the server does not support the method, it responds no\n and waits for the client to request another. If it does support the method, it responds yes\n and then performs that method. At the time of writing, the supported authentication methods are hostname, unix, kerberos, and globus.

    In this example, a client first requests kerberos authentication, and then hostname: client: kerberos server: no client: unix server: yes

    Hostname Authentication

    This method of authentication simply asks that the server identify by the client by the name of the host it is calling from. The server responds by performing a remote domain name lookup on the caller's IP address. If this succeeds, the server responds yes\n, otherwise no\n and fails (see below). If the client is authorized to connect, the server sends an additional yes\n otherwise no\n and fails.

    Unix Authentication

    This method of authentication causes the server the challenge the client to prove its identity by creating a file on the local file system. The server transmits the name of a non-existent file in a writable directory. The client must then attempt to create it. If successful, the client should respond yes\n otherwise no\n and fails. The server then checks that the file was actually created, and if satisfied responds yes\n otherwise no\n and fails.

    Kerberos Authentication

    This method is performed by invoking the standard Kerberos 5 libraries to authenticate in the usual manner, and is defined by that standard.

    Globus Authentication

    This method is performed by invoking the Secure Sockets Layer to authenticate using Globus credentials and is defined by that standard.

    Completing Authentication

    If the authentication method succeeds, then the server will send two lines to the client, giving the successful authentication method and the client's identity. The client may then proceed to the main protocol below. If the method fails, then the server and client return to negotiating a method.

    Chirp Protocol

    A request consists of an LF-terminated line possibly followed by binary data. At a minimum, a Chirp server must accept requests of 1024 characters. It may accept longer lines. If a line exceeds a server's internal maximum, it must gracefully consume the excess characters and return a TOO_BIG error code, defined below. Certain requests are immediately followed by binary data. The number of bytes is dependent on the semantics of the request, also described below.

    A request is parsed into words separated by any amount of white space consisting of spaces and tabs. The first word of the request is called the "command" and the rest are called "arguments."

    Words fall into two categories: strings and decimals. A string is any arbitrary sequence of characters, with whitespaces and unprintables encoding according to RFC 2396. (Very briefly, RFC 2396 says that whitespace must be encoding using a percent sign and two hex digits: ab cd becomes ab%20cd.) A decimal is any sequence of the digits 0-9, optionally prefixed by a single + or -.

    At the protocol level, words have no maximum size. They may stretch on to fill any number of bytes. Of course, each implementation has limits on concepts such a file name lengths and integer sizes. If a receiver cannot parse, store, or execute a word contained in a request, it must gracefully consume the excess characters and return a TOO_BIG error response, defined below.

    A response consists of an LF-terminated line, bearing an ASCII integer. A valid response may also contain whitespace and additional optional material following the response. If the response is greater than or equal to zero, the response indicates the operation succeeded. If negative, the operation failed, and the exact value tells why:

    -1 NOT_AUTHENTICATED The client has not authenticated its identity. -2 NOT_AUTHORIZED The client is not authorized to perform that action. -3 DOESNT_EXIST There is no object by that name. -4 ALREADY_EXISTS There is already an object by that name. -5 TOO_BIG That request is too big to execute. -6 NO_SPACE There is not enough space to store that. -7 NO_MEMORY The server is out of memory. -8 INVALID_REQUEST The form of the request is invalid. -9 TOO_MANY_OPEN There are too many resources in use. -10 BUSY That object is in use by someone else. -11 TRY_AGAIN A temporary condition prevented the request. -12 BAD_FD The file descriptor requested is invalid. -13 IS_DIR A file-only operation was attempted on a directory. -14 NOT_DIR A directory operation was attempted on a file. -15 NOT_EMPTY A directory cannot be removed because it is not empty. -16 CROSS_DEVICE_LINK A hard link was attempted across devices. -17 OFFLINE The requested resource is temporarily not available. -127 UNKNOWN An unknown error occurred.

    Negative values beyond -17 are reserved for future expansion. The receipt of such a values should be interpreted in the same way as UNKNOWN.

    Chirp Commands

    Following are the available commands. Each argument to a command is specific Here are the available commands that are standardized. Note that each implementation may support some additional commands which are not (yet) documented because they are still experimental. Each argument to a command is specified with a type and a name. open (string:name) (string:flags) (decimal:mode) Open the file named "name" "mode" is interpreted in the same manner as a POSIX file mode. Note that the mode is printed in decimal representation, although the UNIX custom is to use octal. For example, the octal mode 0755, representing readable and executable by everyone would be printed as 493. "flags" may contain any of these characters which affect the nature of the call:
  • r - open for reading
  • w - open for writing
  • a - force all writes to append
  • t - truncate before use
  • c - create if it doesn't exist
  • x - fail if 'c' is given and the file already exists
  • The open command returns an integer file description which may be used with later calls referring to open files. The implementation may optionally return file metadata in the same form as the stat command, immediately following the result. A client implementation must be prepared for both forms of result. On failure, returns a negative value in the response. Note that a file descriptor returned by open only has meaning within the current connection between the client and the server. If the connection is lost, the server will implicitly close all open files. A file descriptor opened in one connection has no meaning in another connection. close (decimal:fd) Complete access to this file descriptor. If the connection between a client an server is lost, all files are implicitly closed. read (decimal:fd) (decimal:length) Read up to "length" bytes from the file descriptor "fd." If successful, the server may respond with any value between zero and "length." Immediately following the response will be binary data of exactly as many bytes indicated in the response. If zero, the end of the file has been reached. If any other value is given, no assumptions about the file size may be made. pread (decimal:fd) (decimal:length) (decimal:offset) Read up to “length” bytes from the file descriptor “fd”, starting at “offset”. Return value is identical to that of read. read (decimal:fd) (decimal:length) (decimal:offset) (decimal:stride_length) (decimal:stride_skip) Read up to "length" bytes from the file descriptor "fd", starting at “offset”, retrieving “stride_length” bytes every “stride_skip” bytes. Return value is identical to that of read. write (decimal:fd) (decimal:length) Write up to "length" bytes to the file descriptor "fd." This request should be immediately followed by "length" binary bytes. If successful, may return any value between 0 and "length," indicating the number of bytes actually accepted for writing. An efficient server should avoid accepting fewer bytes than requested, but the client must be prepared for this possibility. pwrite (decimal:fd) (decimal:length) (decimal:offset) Write up to "length" bytes to the file descriptor "fd" at offset “offset”. Return value is identical to that of write. swrite (decimal:fd) (decimal:length) (decimal:offset) (decimal:stride_length) (decimal:stride_skip) Write up to "length" bytes to the file descriptor "fd", starting at “offset”, writing “stride_length” bytes every “stride_skip” bytes. Return value is identical to that of write. fstat (decimal:fd) Get metadata fsync (decimal:fd) Block until all uncommitted changes to this file descriptor have been written to stable storage. lseek (decimal:fd) (decimal:offset) (decimal:whence) Move the current seek pointer of "fd" by "offset" bytes from the base given by "whence." "whence" may be:
  • 0 - from the beginning of the file
  • 1 - from the current seek position
  • 2 - from the end of the file.
  • Returns the current seek position. rename (string:oldpath) (string:newpath) Rename the file "oldpath" to be renamed to "newpath". unlink (string:path) Delete the file named "path". rmdir (string:path) Delete a directory by this name. rmall (string:path) Recursively delete an entire directory. mkdir (string:name) (decimal:mode) Create a new directory by this name. "mode" is interpreted in the same manner as a POSIX file mode. Note that mode is expressed in decimal rather than octal form. fstat (decimal:fd) Get metadata describing an open file. If the response line indicates success, it is followed by a second line of thirteen integer values, indicating the following values:
  • st_dev - Device number.
  • st_ino - Inode number
  • st_mode - Mode bits.
  • st_nlink - Number of hard links.
  • st_uid - User ID of the file’s owner.
  • st_gid - Group ID of the file.
  • st_rdev - Device number, if this file represents a device.
  • st_size - Size of the file in bytes.
  • st_blksize - Recommended transfer block size for accessing this file.
  • st_blocks - Number of physical blocks consumed by this file.
  • st_atime - Last time file was accessed in Unix time() format.
  • st_mtime - Last time file data was modified in Unix time() format.
  • st_ctime - Last time the inode was changed, in Unix time() format.
  • Note that not all fields may have meaning to all implementations. For example, in the standalone Chirp server, st_uid has no bearing on access control, and the user should employ setacl and getacl instead. fstatfs (string:path) Get filesystem metadata for an open file. If the response line indicates success, it is followed by a second line of seven decimals with the following interpretation:
  • f_type - The integer type of the filesystem.
  • f_blocks - The total number of blocks in the filesystem.
  • f_bavail - The number of blocks available to an ordinary user.
  • f_bsize - The size in bytes of a block.
  • f_bfree - The number of blocks free.
  • f_files - The maximum number of files (inodes) on the filesystem.
  • f_ffree - The number of files (inodes) currently in use
  • fchown (decimal:fd) (decimal:uid) (decimal:gid) Change the ownership of an open file to the UID and GID indicated. fchmod (decimal:fd) (decimal:mode) Change the Unix mode bits on an open file. ftruncate (decimal:fd) (decimal:length) Truncate an open file. getfile (string:path) Retrieves an entire file efficiently. A positive response indicates the number of bytes in the file, and is followed by exactly that many bytes of binary data which are the file’s contents. putfile (string:path) (decimal:mode) (decimal:length) Stores an entire file efficiently. The client first sends the request line indicating the file name, the desired mode bits, and the length in bytes. The response indicates whether the client may proceed. If it indicates success, the client must write exactly “length” bytes of data, which are the file’s contents. If the response indicates failure, the client must not send any additional data. getlongdir (string:path) Lists a directory and all metadata. If the response indicates success, it will be followed by a series of lines, alternating the name of a directory entry with its metadata in the same form as returned by fstat. The end of the list is indicated by a single blank line. getdir (string:path) Lists a directory. If the response indicates success, it will be followed by a series of lines indicating the name of each directory entry. The end of the list is indicated by a single blank line. getacl (string:path) Get an access control list. If the response indicates success, it will be followed by a series of lines indicating each entry of the access control list. The end of the list is indicated by a single blank line. setacl (string:path) (string:subject) (string:rights) Modify an access control list. On an object identified by path, set rights rights for a given subject. The string “-“ may be used to indicate no rights. whoami Get the user’s current identity with respect to this server. If the response is greater than zero, it is followed by exactly that many bytes in data, containing the user’s identity. whoareyou (string:rhost) Get the server’s current identity with respect to a remote host. If the response is greater than zero, it is followed by exactly that many bytes in data, containing the server’s identity. link (string:oldpath) (string:newpath) Create a hard link from newpath to oldpath. symlink (string:oldpath) (string:newpath) Create a symlink from newpath to oldpath. readlink (string:path) Read the contents of a symbolic link. If the response is greater than zero, it is followed by exactly that many bytes in data, containing the contents of the link. mkdir (string:path) (decimal:mode) Create a new directory with the given Unix mode bits. stat (string:path) Get file status. If the response indicates success, it is followed by a second line in the same format as the command fstat. If the pathname represents a symbolic link, this command examines the target of the link. lstat (string:path) Get file status. If the response indicates success, it is followed by a second line in the same format as the command fstat. If the pathname represents a symbolic link, this command examines the link itself. statfs (string:path) Get filesystem status. If the response indicates success, it is followed by a second line in the same format as the command fstatfs. access (string:path) (decimal:mode) Check access permissions. Returns success if the user may access the file according to the specified mode. The “mode” may be any of the values logical-or’d together:
  • 0 (F_OK) Test for existence of the file.
  • 1 (X_OK) Test if the file is executable.
  • 2 (W_OK) Test if the file is readable.
  • 4 (R_OK) Test if the file is executable.
  • chmod (string:path) (decimal:mode) Change the Unix mode bits on a given path. (NOTE 1) chown (string:path) (decimal:uid) (decimal:gid) Change the Unix UID or GID on a given path. If the path is a symbolic link, change its target. (NOTE 1) lchown (string:path) (decimal:uid) (decimal:gid) Change the Unix UID or GID on a given path. If the path is a symbolic link, change the link. (NOTE 1) truncate (string:path) (decimal:length) Truncate a file to a given number of bytes. utime (string:path) (decimal:actime) (decimal:mtime) Change the access and modification times of a file. md5 (string:path) Checksum a remote file using the MD5 message digest algorithm. If successful, the response will be 16, and will be followed by 16 bytes of data representing the checksum in binary form. thirdput (string:path) (string:remotehost) (string:remotepath) Direct the server to transfer the path to a remote host and remote path. If the indicated path is a directory, it will be transferred recursively, preserving metadata such as access control lists. mkalloc (string:path) (decimal:size) (decimal:mode) Create a new space allocation at the given path that can contain “size” bytes of data and has initial mode of “mode”. lsalloc (string:path) List the space allocation state on a directory. If the response indicates success, it is followed by a second line which gives the path of the containing allocation, the total size of the allocation, and the

    (NOTE 1) The standalone Chirp server ignores the traditional Unix mode bits when performing access control. Calls such as fchmod, chmod, chown, and fchown are essentially ignored, and the caller should employ setacl and getacl instead.