Name: Helen Xiang
Institution: University of Portsmouth
Research: Astronomy databases
Since 2000, the Sloan Digital Sky Survey (SDSS) has been taking multi-colour digital images of the northern sky. To date, nearly 300 million celestial objects have been detected with the data being used to support research on all areas of astronomy and cosmology, from asteroids to the large scale structure of the universe.
Relational databases in both the UK and the US hold over 100 parameters for each object, such as brightness, colour and shape. The databases are used by both the astronomy community and the public. One public project recently using the SDSS data is GalaxyZoo. This project has over 100,000 volunteers from around the world visually classifying the galaxies imaged by the SDSS.
All this data however presents difficulties with both storage and access for the users. As part of her PhD research, Helen Xiang from the University of Portsmouth has been looking at using Oracle databases hosted on the NGS to store the data. Recently she succeeded in transferring almost 2 Terabytes of SDSS data to the NGS Oracle database in Manchester. A separate Microsoft SQL database at Portsmouth holds another 2 Terabytes of similar data. Joint queries on the two databases have been successfully run.
Easily accessing distributed, diverse astronomy databases is a key requirement of future astronomy projects. The next generation of astronomy surveys includes the Large Synoptic Survey Telescope (LSST) which will produce petabytes of raw image data.
With such diverse resources, the team at the University of Portsmouth has been experimenting with grid data management software such as OGSA-DAI and OGSA-DQP. They have been exploring its performance for jointly accessing SDSS data at both the NGS and Portsmouth.
“Astronomers have embraced the use of relational databases in their research but keep filling them up” comments Prof Bob Nichol at the Institute of Cosmology and Gravitation, Portsmouth. “Today we have terabytes, tomorrow it will be petabytes like particle physicists. The difference however is that astronomers usually have quite complex queries which push the database capabilities.”
Helen Xiang adds “We are using OGSA-DQP now to implement astronomy queries on distributed SDSS data on NGS and in Portsmouth. We’ve had to change the code quite a bit to get it to work.” Helen’s research will hopefully provide important lessons on how massive distributed data systems should be designed.
Download a copy of this case study - astronomy databases
Download a summary slide of this case study