HOWTO: Run Work Queue when there is a firewall between master and worker

Certain configurations require the Work Queue master and the workers to be deployed on two different networks or domains with firewall restrictions that prevent direct communication between the two.

For example, consider the scenario where the master is run from a campus node while several workers are deployed and run in off-campus HPC cluster. Often, the campus network and the off-campus HPC cluster both have firewall restrictions that prevent the establishment of connections from outside their networks. In such cases, the Work Queue workers cannot connect to the master directly. To overcome such restrictive scenarios, we describe below the steps to setup port forwarding that will enable communication between the master and workers in the presence of firewall restrictions.

First, set up SSH port forwarding connection from the node running the Work Queue master (myworkstation.nd.edu) to the head node (headnode.cluster.com) of the off-campus cluster.

To do this, run this command from myworkstation.nd.edu (running the Work Queue master):

% ssh -N -R 10001:localhost:9123 myuserid@headnode.cluster.com

The above command assumes that the Work Queue master is listening on port 9123 at myworkstation.nd.edu. This command will force any connection to port 10001 on headnode.cluster.com to be rerouted to port 9123 on myworkstation.nd.edu (specified as localhost in the command). (Remove the -N option if you want access to a remote terminal at headnode.cluster.com on running the command).

Now set up SSH port forwarding from the HPC cluster nodes on which the Work Queue workers run to the head node. To do this, modify the submission script that submits the execution of Work Queue workers as jobs to this cluster to setup port forwarding before starting execution of a worker:

% ssh -N -L 10002:localhost:10001 myuserid@headnode.cluster.com
% ./work_queue_worker $arguments localhost 10002

This command causes any connection to port 10002 on the cluster node on which the worker runs to be routed to port 10001 on headnode.cluster.com. And from the previous step, any connection to port 10001 on the head node will be routed to myworkstation.nd.edu on port 9123.

Note here that the Work Queue worker is specified to connect to port 10002 on the cluster node (specifed as localhost). Due to the above setup, this connection will be forwarded to the Work Queue master listening at port 9123 on myworkstation.stanford.edu (via headnode.cluster.com on port 10001).

In practice, the ports 10002 and 10001 illustrated here can be a port of your choice. But you will need to make sure that the chosen ports are available and do not interfere with ftp, ssh, http services and any other web services running on these machines. Also, if the port forwarded connections are long running, it is advisable to set the ServerAliveInterval option in the ssh command to send keep alive messages periodically.

Thanks to Lee-Ping Wang at Stanford for his initiative and input in drafting this How-To. [an error occurred while processing this directive]