Name: Dr Andy Turner
Institute: University of Leeds
Research: Geodemographic modelling
2001 saw the collection of the last human population census in the UK, with the next due in 2011. The history of the census dates back over 200 years and it has become more detailed and more used in activities like public service planning.
In planning for services such as transport, health, schools and housing it is useful to have information about the human population, their residences and activity. The census provides important statistics including estimates of how many people live in an area, how old they are, their current occupations and their health and provision of care to others.
Andy Turner and colleagues at the University of Leeds are aiming to take the use of the census and other population data in public service planning a step further.
The Modelling and Simulation for e-Social Science (MoSeS) began in 2005 as part of the National Centre for e-Social Science (NCeSS) research programme. It aims to develop national demographic simulations and their use by social scientists and policy planners to help answer specific questions and analyse scenarios. Examples include estimating demand for services such as transport schemes or for access to health and social care services for specific areas.
The basis of the demographic modelling are data from the 2001 census. Various census datasets are used to generate individual and household level datasets for 2001. These data for 2001 are then projected at the individual and household level for small areas on a yearly basis up to 2031. It is hoped that a scenario based forecasting approach will allow social scientists to examine possible effects of environmental change and better plan for the likely effects of an ageing population.
Modelling a population the size of the UK is a large task. On a year-by-year basis simulating births, deaths, marriages and people movements involves a considerable amount of data and computation.
All the modelling code has been written by the project members using Java and parts of it have been parallelised using the MPJExpress software to take advantage of High End Computers like those available on the NGS. NGS staff have helped the project migrate their code from their own 32 node cluster to the larger NGS clusters and submit jobs using the PBS job manager.
Andy only started using the NGS for MoSeS in November 2007, but already large amounts of CPU time have been utilised and large amounts of useful data have been created and archived.
“One population initialisation for the UK requires around 20000 CPU hours using about 1.2GB of fast access memory and generating a dataset of around 2GB in size.” Explains Andy. “The data and computational requirements for a dynamic simulation of the UK are many times higher.”
A dynamic simulation for the UK has yet to be done, but would use approximately 2.4TB and that’s only one scenario. To study trends and scenario extremes, thousands of runs would be needed. Andy says “One simulation would need 100 attributes, each one updated once a year for 30 years. If we had 100 million people and 8 bytes per attribute that is over 2TB of data per run. To start with I’ll be satisfied with running through the entire thing twice, the second time just taking a few button clicks and a bit of patience!”.
This research was funded by ESRC as part of the National Centre for e-Social Science.
Download a copy of this case study - Geodemographic modelling
Download a summary slide of this case study