Name: Jean-Alain Grunchec
Institution: University of Edinburgh
Research: Quantitative Genetic Analyses
Quantitative genetics is a discipline that aims to explain variations between individuals in a population based on their genetic architecture. The difference between individual DNA sequences on some genes can sometimes be related to quantitative observable differences such as size. The chromosomal regions explaining these quantitatively observable differences are called Quantitative Trait Loci (QTL). Selective breeding of domestic animal species can be assisted by QTL mapping. QTL have also been related to risks of developing diseases such as cancer or diabetes.
QTL Express was developed 8 years ago as a web site that allows geneticists to run QTL analyses. Powerful statistical methods were applied to animal populations bred through specific experimental mating programs. QTL Express distributed calculations on a set of 6 local dedicated computers with hundreds of scientists having used this service since. However the scientists computational demands have often exceeded the capacity of the local cluster resulting in a decreased quality of service.
More complex QTL mapping methods have been developed recently. Linkage Disequilibrium Linkage Analyses (LDLA) can be applied to populations that were not experimentally bred. It can therefore be applied to a much larger set of populations. Epistasis analyses are new methods that attempt to locate interacting genes. They can identify many more QTL than older corresponding QTL mapping methods. However, single LDLA analyses can take hundreds of CPU hours per trait and epistasis analyses are thousands times more computational than older types of analyses and can also require hundreds of CPU hours per dataset.
This is where GridQTL comes in. It can provide web services that extend QTL Express capacities and allow scientists to run the newer QTL mapping methods on the NGS. The Globus Toolkit is used to submit jobs to several NGS sites allowing users to experience an automated robust, fast, dynamic, transparent and free level of service thanks to the combined use of the SWARM meta-scheduler, an AJAX driven GridSphere interface and the NGS resources.
Over an 18 month period over 80 scientists have used the GridQTL portal performing 11,000 analyses in that period taking 2.5 CPU years in total. Jean-Alain Grunchec, (GridQTL research engineer), explained that using the NGS has given scientists clear advantages over the use of a single local cluster “Computational resources provided by the NGS are large and have different capacities that can fit different analyses. Some clusters have a maximum job execution time of up to 3 weeks and can tackle very long jobs, whereas other clusters will give a priority to shorter jobs. Several clusters have compute nodes with 32Gb of RAM and are therefore well suited for tasks requiring a lot of memory. We have large storage capacities are available locally to store temporarily the results”.
In terms of performance, large analyses can be sped up 147 times. Some LDLA analyses that would have taken 30 hours on single core computers could be done in 12 minutes thanks to the NGS. This allows some users to run many of these analyses per day. The epistasis analyses can also be run faster, analysis that previously took 26 days can now be done in 15 hours. As Jean Alain said “To put it simply, thanks to the NGS, our users can do in days some research that would require months of computations on their own desktop PC”.
More information is available about the aims and scope of the GridQTL project.
Project funding - Biotechnology and Biological Sciences Research Council of UK (BBS/B/1695X to GridQTL project); Research Councils UK (GR/T27983/01 to S.K.)>
PI - Dr Sarah Knott
Download a summary slide of this case study