Windows R NGS integration
This set of utilities are designed to allow the NGS to be used direct from the R GUI in Windows. It is not currently packaged for end users and requires expert help to install. This is still very much in development and is for experts to take on and expand.
Licence: GPL v2
Download: R-Windows-GLOB.zip
Concept
The integration of R with the NGS centres on the use of a function called pforf. This implements a "parallel for-loop" - like a standard for loop but with each iteration executing automatically on a different NGS worker node. Please note this only works for loops where there is no dependencies between the different values as detailed in the previous section.
The function is defined as follows:
result = pforf(items, function_name, argument)
The function takes three arguments:
- items : is a list of values to iterate over
- function_name : is the name of the function to run on each of the nodes - it should be in quotes.
- argument: is a value (most usefully a list with named elements) which is passed to each of the different runs of the function.
The result is a list of the return values from each run indexed by the relevant item. The function named in function_name takes two values, the first is one of the items from the list items and the other is a copy of argument. To use pforf the following line must be included at the top of the main program:
source('c:\\Program Files\\R\\R-Windows-GLOB\\grid.R')
For example the following script:
source('c:\\Program Files\\R\\R-Windows-GLOB\\grid.R')
mainbody <- function(it, arg) {
list(c= arg$a * it + arg$b)
}
arg<- list(
a = 100,
b = 30)
out = grid.pforf(1:10, "mainbody", arg)
total=0
for (i in 1:length(out)) {
total = total + out[[i]]$c
}
total
The pforf function works by creating a wrapper function for each item in items which loads in from an R variable save files the argument variable along and sets the value for the appropriate member of the items list. The wrapper then uses these to call the function, before saving the results to another file.
The pforf function sets up the files with the argument variable, writes the helper functions, calls the globus grid scripts to run each of the helper scripts and when the all jobs finish (it waits for all the jobs to finish), the pforf function will finally collate the results and return them.
There is a restart function which allows the polling to be restarted, this is explained when a job is run.
Using
To be able to use this set of tools with the NGS you will first need to request a certificate from the UK e-Science CA and then (once you have recieved your certificate) you will need to apply for a NGS account. This is explained in the how to join section.
Once you have the account and certificate follow these instructions to back up your certifcate from your web browser, this will leave you with a .p12 or .pfx file. You can then use the openssl commands to install your certificate.
Before running your jobs you'll need to use the grid-proxy-init command or similar to create a proxy certificate file.
Installing
- Extract this archive to C:\Program Files\R\R-Windows-GLOB
- Create the directory: C:\Program Files\R\R-Windows-GLOB\bin
- Follow the included instructions below to compile globus statically under windows - you'll need Cygwin installed
- Copy gsissh.exe, gsiscp.exe and grid-proxy-info.exe from the build tree to C:\Program Files\R\R-Windows-GLOB and gsissh.exe to C:\Program Files\R\R-Windows-GLOB\bin
- Copy C:\cygwin\bin\cygwin1.dll and C:\cygwin\bin\cygz.dll to both C:\Program Files\R\R-Windows-GLOB and C:\Program Files\R\R-Windows-GLOB\bin

