Primary links
- Home
- About Us
- Use the NGS
- Learn More
- Innovation
- Member Sites
- Help
- SeIUCCR Summer School 2012
"Steve's Tests" have helped the Grid community in the UK by providing diagnostic information which can help the systems administrators debug their site. Started by Steve Lloyd, GridPP's collaboration board chair, in January 2007 as a simple single ATLAS job the tests have now become a complete set of tools indicating and helping to diagnose the state of the UK Grid. This newest, and Steve hopes final, test simply called Network Test is an attempt to keep an eye on bandwidth issues at sites and between sites. Once the job is run at a site it copies a file from the Tier-1 to a storage element at that site, the file is then passed to the worker node on which the job is running which moves a copy of the file to every other site in the UK. The time taken for each leg is recorded and the bandwidth totalled. These are then graded with each site going from red to green depending on these numbers with green (25MB/s) being the target.
There are now three versions of the original ATLAS tests - running a simple Athena (the ATLAS software framework) "Hello World" straight from the ATLAS release libraries, building a custom "Hello World" from source and attempting to analyse 100 pre-generated Z0 to e+e- events and calculate the Z0 mass. These tests pre-dated properly instrumented experiment 'dashboards' and provided sites with useful diagnostic information allowing them to debug their site, especially misconfigured Worker Nodes in the early days, which were hard to spot with other tests available at the time.
Since the ATLAS tests are sent to each site's Computing Element (CE) individually regardless of their status they did not give a good indication of the overall performance seen by a user, as a user's jobs would not get sent to a site that was broken or down. Hence there is now a variation on the ATLAS tests (called the UK tests) whereby the data analysis job is sent to any, unspecified, suitable UK CE which gives a much better indication of the overall UK efficiency.
As the Grid matured, the general testing infrastructure became more widespread and reliable and the SAM (Service Availability Monitoring) tests became an important indicator of site performance. Although GridPP does not run its own SAM tests, the "Steve's" testing framework is used to regularly poll the SAM database and store the results for display and historical reporting. Initially only the OPS (Operator) Critical Tests were monitored but recently this has been extended to individual tests for ATLAS, CMS and LHCb as well.
Read more about Steve's jobs
For further information on the NGS software stack, please refer to the NGS Site Level Services reference document.