Connecting Infrastructure, Connecting Research

NGS Workload Management System and User Interface tutorials

Purpose of this tutorial

By the end of this tutorial you will have been taken through the stages of running a job using the NGS UI-WMS (Work Load Management System).

This allows you to submit a job to the NGS as a whole and allow the WMS to select the most appropriate resources for running your job.

In order to do this you must go through the following steps:

1.     You must have a valid digital certificate recognized by the NGS

2.     You must create and upload a valid proxy certificate, which will allow the WMS to run jobs on your behalf.

3.     You must describe to the WMS, using a JDL file, some of the parameters which you consider important in selecting the right resources to run your job.

4.     You can then send your job to the WMS

5.     After the job has run you can retrieve your output.

All of these steps will be covered in the tutorial.

Notes and prerequisites

This tutorial assumes you already have an NGS certificate which you have exported from your browser. If not please see the certificate section of the NGS web site (http://www.ngs.ac.uk/How-to-Join). The tutorial also assumes you have a NGS account (see the same section).

How to use the WMS

Creating and uploading a proxy

NOTE  - Only create a normal proxy in myproxy.

The UI will create VOMS credentials for you correctly after you choose from a list of supported VOs (see screen grab from a UI login session below).

 

Login to the UI

To use the UI/WMS resource broker, users should login to the UI machine at

ngsui03.ngs.ac.uk

You can use either a normal SSH client (Putty, ssh command line etc), the VOMS enabled GSI-SSHTerm java terminal or other SSH clients you are familiar with.

Details below 

 

SSH Login (preferred method)

You can login to the UI using any ssh client .

NOTE - Use port 2223 (rather than the normal default of 22).

The username and password to use, is that of an uploaded proxy in the myproxy server (at myproxy.ngs.ac.uk). eg

ssh –p 2223 <myproxy-name>@ngsui03.ngs.ac.uk

GSI-SSHTerm (Alternative method)

The NGS Portal tutorial includes a online tutorial in using the java VOMS enabled GSI-SSHTerm. You should start this tool from the GSI-SSHTerm page by clicking the orange "Launch" button.

If this is the first time you have run the VOMS enabled version of the tool, however you should run the Myproxy Uploader tool at least once to setup the NGS VOMS information required for GSI-SSHTerm.

 

On first logging in

When you login, the UI will check your certificate proxy for common problems, such as missing or expired VOMS credentials (see below). More details of this check, beyond the VOs question in the above screen grab, can be found on the UI-WMS Proxy Check page.
 
* You need a valid 'VOMS proxy' to use the WMS to submit jobs.

* Your proxy currently has no VOMS AC component or it has expired.

*

* Which VO, that you are a member of, can you run your job as ?

dteam

gin.ggf.org

mott2.org

nanocmos.ac.uk

ngs.ac.uk

ops

training.ngs.ac.uk

none : [ ngs.ac.uk ]  
 
Hit <CR> to accept the default of ngs.ac.uk and a normal shell command prompt will be returned.

Checking Credentials

To submit jobs to the WMS resource broker your grid credentials must be valid and have the VOMS extensions. To confirm everything is ok, try running 'voms-proxy-info -all' as below:
[ngs0055@ngsui03 ~]$ voms-proxy-info -all
 
This should result in something similar to the following:
subject   : /C=UK/O=eScience/OU=CLRC/L=RAL/CN=jonathan churchill/CN=proxy/CN=proxy/CN=proxy

issuer    : /C=UK/O=eScience/OU=CLRC/L=RAL/CN=jonathan churchill/CN=proxy/CN=proxy

identity  : /C=UK/O=eScience/OU=CLRC/L=RAL/CN=jonathan churchill/CN=proxy/CN=proxy

type      : proxy

strength  : 512 bits

path      : /tmp/x509up_p9398.fileGD96zy.1

timeleft  : 11:59:40

=== VO ngs.ac.uk extension information ===

VO        : ngs.ac.uk

subject   : /C=UK/O=eScience/OU=CLRC/L=RAL/CN=jonathan churchill

issuer    : /C=UK/O=eScience/OU=Manchester/L=MC/CN=voms.ngs.ac.uk/Email=support@grid-support.ac.uk

attribute : /ngs.ac.uk/Role=NULL/Capability=NULL

timeleft  : 11:59:56

uri       : voms.ngs.ac.uk:15010

This shows that your grid credentials are valid for 12hours (timeleft 11:59:40) and the VOMS part will last the same time (11:59:56). If your proxy expires during the time your job is running then it will fail (Note - this may differ from the behaviour of other NGS components).

Describe your job’s requirements: Create a JDL

A JDL (Job Description Language) file describes a job that can be run via the UI/WMS. A JDL file is used by the client software to pass details of your job to the WMS.
 
Create a plain text file called hostname.jdl with the contents (either on your own machine for later upload or on the UI using VIM).
 
Type = "Job";

JobType = "Normal";

Executable = "/bin/hostname";

StdOutput = "hostname.out";

StdError = "hostname.err";

OutputSandbox = {"hostname.err","hostname.out"};

Arguments = "-f";

RetryCount = 3;

ShallowRetryCount = -1;

Requirements = RegExp("ngs",other.GlueCEUniqueID);
 
The last, requirements line is an example of how to select specific nodes, in this case based on the compute nodes' names.
 

Checking resources: Job List Match

Before running the job, it is useful to test which computing elements (CE's) are able to accept it. To see a list of all CE's that will accept your job use the command:
glite-wms-job-list-match -a hostname.jdl
This should give you output similar to:
Connecting to the service https://ngswms01.ngs.ac.uk:7443/glite_wms_wmproxy_server

 

==========================================================================

 

                     COMPUTING ELEMENT IDs LIST

 The following CE(s) matching your job requirements have been found:

 

        *CEId*

 - ce2.ppgrid1.rhul.ac.uk:2119/jobmanager-pbs-ngs

 - cerb-condor.bris.ac.uk:2119/jobmanager-condor-ngs.ac.uk

 - grid.ecdf.ed.ac.uk:2119/jobmanager-sge-ngs

 - hepgrid2.ph.liv.ac.uk:2119/jobmanager-lcgpbs-ngs

 - ngs.oerc.ox.ac.uk:2119/jobmanager-pbs-workq

 - ngs.rl.ac.uk:2119/jobmanager-lsf-ngs

 - ngs.wmin.ac.uk:2119/jobmanager-pbs-default

 - vidar.ngs.manchester.ac.uk:2119/jobmanager-pbs-workq

 - condorngs.cf.ac.uk:2119/jobmanager-condor-INTEL_WINNT51

 - ngs.leeds.ac.uk:2119/jobmanager-pbs-mpi

 - troilus.wrg.york.ac.uk:2119/jobmanager-sge-ngs.ac.uk

 

==========================================================================
The details are not important at this point, beyond the fact that as long as you see one or more CE listed, then 
there is somewhere that your jobs should run successfully.

Note – this does not guarantee anything about the load on those nodes, just that the job would have the correct resources available to it on those nodes. Load information is one of the extra elements we could add to the JDL but it is not included yet.

If you create a second JDL file, without the Requirements constraint in the JDL given above, you will see a larger complete set of all NGS resources.

 

Checking NGS Node Status

The Information System can be used as an easy way of checking the status of current NGS nodes that can be used with the WMS, by the following command:
lcg-infosites  --vo ngs.ac.uk ce

You can use the output of this command to help you build up the requirements in your JDL. For example, you may wish to have a minimum number of CPUs to work with, you can check how many of the NGS nodes meet this requirement, before writing it into your JDL.  The output will also give you some indication of how busy the various sites were when the command was run.

Note, the load information is by its nature always historical and should be used as a guide. The status of sites may change between the time of running lcg-infosites and submitting your job.

Note – this command starts with lcg- as it is a legacy command originating before the development of gLite (earlier all of the job submission and monitoring commands had been lcg- prior to the development of the WMS).

This gives an output such as:

lcg-infosites --vo ngs.ac.uk ce

#CPU    Free    Total Jobs      Running Waiting ComputingElement

----------------------------------------------------------

 249       1     707            222      460    ngs.rl.ac.uk:2119/jobmanager-lsf-ngs

 252       0     259             65      194    ngs.leeds.ac.uk:2119/jobmanager-pbs-mpi

1000     110     792            651      101    scarf.rl.ac.uk:2119/jobmanager-lsf-scarf

 252      91      11             11        0    ngs.oerc.ox.ac.uk:2119/jobmanager-pbs-workq

   5       5       0              0        0    grid.ecdf.ed.ac.uk:2119/jobmanager-sge-ngs

 492     370     102            102        0    grid2.lancs.ac.uk:2119/jobmanager-sge-serial

 376     329       0              0        0    ce2.ppgrid1.rhul.ac.uk:2119/jobmanager-pbs-ngs

 218     218       0              0        0    ngs.wmin.ac.uk:2119/jobmanager-pbs-default

   0       0       0              0        0    grid2.lancs.ac.uk:2119/jobmanager-sge-lancaster

   0       0       0              0        0    lancs1.nw-grid.ac.uk:2119/jobmanager-sge-serial

3360      51       1              0        1    lcgce02.gridpp.rl.ac.uk:2119/jobmanager-lcgpbs-gridS

   0       0       0              0        0    troilus.wrg.york.ac.uk:2119/jobmanager-sge-ngs.ac.uk

 144      92      49             49        0    cluster1.epsam.keele.ac.uk:2119/jobmanager-sge-all.q

....

Running a job: Job Submission

A job can be submitted with the command

glite-wms-job-submit -a -o MyID hostname.jdl

The option "-o" allows you to specify a file (in this case "MyID") to store the unique identifier of your job. 

Note - This is useful as these are long URIs which have to be re-typed to get the job status when you are monitoring your job.
You can store the UID's for multiple jobs in a single file.  The option "-a" generates a name for the certificate proxy that is associated with this job - we'll discuss this more later.  You should get output similar to the following:

Connecting to the service https://ngswms01.ngs.ac.uk:7443/glite_wms_wmproxy_server

====================== glite-wms-job-submit Success ======================

The job has been successfully submitted to the WMProxy
Your job identifier is:

https://ngswms01.ngs.ac.uk:9000/VW8-SUBr6tNo_HU1JTNEXg

The job identifier has been saved in the following file:
/home/ngs0055/MyID

==========================================================================

Checking on your job’s progress: Job Status

To check on the current status of your job use the command

glite-wms-job-status -i MyID

If you wish to see more information, you can use the additional options "-v 2" and "-v 3" to increase the amount of information displayed.  MyID is the file you created in the previous step to hold your job identity.  When your job is complete you should get a status message similar to the below:

*************************************************************
BOOKKEEPING INFORMATION:

Status info for the Job : https://ngswms01.ngs.ac.uk:9000/VW8-SUBr6tNo_HU1JTNEXg
Current Status:     Done (Success)
Exit code:          0
Status Reason:      Job terminated successfully
Destination:        ngs.oerc.ox.ac.uk:2119/jobmanager-pbs-workq
Submitted:          Fri Oct  2 17:50:45 2009 BST
*************************************************************

Getting your results: Job Output

Note - In this tutorial we demonstrate the method for manually retrieving output. Further tutorials will demonstrate the methods for automatically retrieving output to the UI machine.


To manually retrieve the output:

 

glite-wms-job-output --dir . -i MyID

 

The output files will be saved in the current directory.

Use "--dir ./<directoryName>" to put the files in a subdirectory.

Note - If you do not use the "--dir <directoryName>" option then this directory will be created under /tmp with a name based on the ID of the job.

If successful you should get output similar to: 

Connecting to the service https://ngswms01.ngs.ac.uk:7443/glite_wms_wmproxy_server
 

================================================================================

 

                        JOB GET OUTPUT OUTCOME

 

Output sandbox files for the job:

https://ngswms01.ngs.ac.uk:9000/VW8-SUBr6tNo_HU1JTNEXg

have been successfully retrieved and stored in the directory:

/home/ngs0055


================================================================================

Stopping your job: Job Cancel

If anything goes wrong a job can be canceled by the command:

glite-wms-job-cancel -i MyID

 

The output in this case is similar to:

Are you sure you want to remove specified job(s) [y/n]y : y

 

Connecting to the service https://ngswms01.ngs.ac.uk:7443/glite_wms_wmproxy_server

 

============================= glite-wms-job-cancel Success =============================

 

The cancellation request has been successfully submitted for the following job(s):

 

- https://ngswms01.ngs.ac.uk:9000/RWqWoNgRQswP91FJq2wK7w

 

========================================================================================

 

 This is the end of this tutorial. You should now have successfully:

1. Created and uploaded a valid proxy certificate.

 

2. Created a simple JDL file to provide the WMS with some information about your job.

 

3. Identified resources

 

4. Run a job using the WMS

 

5. Retrieved you job's output.

 

You should now be ready to go on to examine in more detail how to customise these steps.