Big future for big data
We were pushing the trend in the drive to be more innovative by exploiting the new parallel architectures for computing…
Big future for big data
Bigger is not always better, but big data could hold the key to success for an initiative that brings together disciplines across the University of Edinburgh, looking at everything under the Sun, from blogs to Higgs bosons...
“Eighteen months ago, I was a sceptic,” says Professor Richard Kenway, Tait Professor of Mathematical Physics at the University of Edinburgh. “Maybe it was only hype – another next big thing.”
But even though Kenway's office is only a stone's throw away from that of Nobel Prize-winning physicist, Emeritus Professor Peter Higgs, he's not referring to the multi-billion-dollar quest for the elusive elementary particle named after his illustrious neighbour, but to a new kind of science that is taking up more of his time every day: data science, or what is commonly known as “big data.”
As the head of a project to bring together every department in Edinburgh to exploit the advantages of data science, Kenway has become a leading advocate for this exciting new branch of science and believes it could transform all kinds of research, as well as teaching. According to Kenway, Edinburgh Data Science will facilitate collaboration between different disciplines, encourage links with industry as well as public services, and also have a long-term impact on education, with many students learning data science as an integral part of their courses.
Kenway's experience with computational science and supercomputers goes back many years, using the most powerful systems available to pursue his own research in physics, exploring “theories of elementary particles using computer simulation of lattice gauge theories, particularly the strong interactions of quarks and gluons described by quantum chromodynamics (QCD).”
In the early 1980s, says Kenway, Edinburgh researchers in particle physics were keen to get their hands on the most powerful, affordable computers they could find, so they could compete with international researchers in simulating QCD. The researchers also played a useful role in the development of supercomputers because their “esoteric” calculations were a technological challenge for companies such as the UK's ICL, who manufactured some of the early machines. “We were pushing the trend,” says Kenway, “in the drive to be more innovative by exploiting the new parallel architectures for computing.” When the Edinburgh Parallel Computing Centre (EPCC) opened in 1990, this accelerated the activities and provided bigger and better machines for researchers in different departments, as well as building bridges with industrial partners. It was a “symbiotic” relationship with industry, creating mutual benefits for everyone, and the ability to simulate extremely complex processes was a huge advance for engineers and scientists, including physicists, biologists and medical researchers.
Thank you, Benedict Cumberbatch
With such a strong background in physics as well as computational science (see sidebar), Kenway is not only in charge of the Edinburgh Data Science project but is also the trailblazer for the university's involvement in the Alan Turing Institute for Data Science, which will be headed by the universities of Edinburgh, UCL, Warwick, Oxford and Cambridge. The new institute, with £42 million in Government funding, will be headquartered at the British Library in London. Its aim is to “promote the development and use of advanced mathematics, computer science, algorithms and big data for human benefit, in an environment that brings together theory and practical application.” It will operate as a not-for-profit company in partnership with the EPSRC (the Engineering and Physical Sciences Research Council).
Edinburgh was selected as one of the five university partners because of its UK-leading position in informatics research, and Kenway is excited by the prospects of getting involved, happy that “the Turing brand” has been hugely helped by the appearance of the popular actor Benedict Cumberbatch in the recent film about Turing, and for the recognition now being given to one of Britain’s greatest unsung scientists. The project aims to be greater than the sum of its parts, not just focusing on pure research but also working with business on translational research, taking advantage of its good links with both public and private sectors.
“If we get it right,” says Kenway, “it will have a huge impact. It's a totally new kind of research institute which could put the UK at the forefront of technology – developing new algorithms to extract useful information from the digital data that we are now generating in all walks of life.”
High-performance computing, says Kenway, will play an increasingly critical role in big data, for the simple reason that “you can only go so far” in getting useful information from the masses of data generated by big corporations such as Google and Facebook. The challenge is how to use that data to predict what will happen in the future, and to do that will require new kinds of algorithms and models which capture what is going on in the data so that they can be simulated. And that is why data science has such a bright future.
Kenway has seen fashions in computing come and go. He has also seen efforts at predicting the behaviour of complex systems in the physical sciences come to fruition; but now the time has come for data science to extend this approach to commerce, health and social sciences – with Edinburgh and its partners in the Alan Turing Institute leading the way.
Edinburgh Data Science
To explain how data science has evolved, Kenway cites the early censuses, beginning with the Doomsday Book, which soon became impractical and drove the development of statistics in the 1900s to understand important social trends by analysing samples of the population. Nowadays, however, we have complete digital data for many aspects of our daily lives, and the challenge has become not to get a good sample but working out what to do with all the data – and this is where new algorithms, neural networks and machine learning can provide a solution by detecting patterns in the “eye-watering” amounts of data in circulation. Making computers work faster while using less power is only part of the answer, says Kenway, adding: “We need entirely new algorithms.”
In the past, the physical sciences made the most use of high-performance computers, but that situation is changing, says Kenway. Interpreting the huge amounts of data generated by the search for the elusive Higgs boson is the kind of application which supercomputers are known to be best at. Extracting useful information from social media, including blogs and other “messy” sources, has a similar “needle in a haystack” character, but is much more challenging because we don’t know what we’re looking for.
The difference, Kenway explains, is that in physics, for example, we usually start with a theory, then do an experiment and simulation of the theory to prove or disprove it. The starting point for social scientists, however, is often lots of unstructured data and we have no idea whether this is governed by an underlying theory, or how changing external factors might affect what we extract. But if scientists could manage to develop new ways to make sense of this social data “chaos,” the results could be dramatic – enabling government and business to plan for the future with much more precision. The problem sometimes feels like trying to develop new tools to discover “what we don't know we are looking for to start with,” says Kenway. “The promise of data science is to move in the opposite direction to conventional science,” he adds, “from data to theory.”
Healthcare will be a major focus of data science, helping understand not only the physical aspects of medicine but also taking many other factors into account such as diet and lifestyle, as well as our genes. Particular drugs and new treatments may work with some individuals but not with everyone, and data science may reveal the reasons for this, delivering the dream of personalised healthcare.
“The great promise of data science,” says Kenway, “is to bring together many different branches of science to produce clean information from the messy and diverse data that is out there, to help us make better predictions.”
This enthusiasm for the benefits of data science is beginning to spread through the campus, and Kenway expects it to affect research in every discipline. The physical sciences may produce more structured, arguably much simpler data, but people in the arts and humanities also see how data science can help. Data science may even change the way students are taught, says Kenway, by analysing data collected from students to “optimise human behaviour” – e.g., recommending better learning techniques to suit individuals – or what is now called “personalised learning.”
Universities today are run like many other businesses, and Kenway thinks that data science offers the same kind of benefits to their operations as are already being exploited by big corporations; as long as the universities ensure their methodology is ethically sound and publicly acceptable, and protects confidentiality. Physicists may view the Universe through theories and algorithms, and data science may not be an “exact science” like many others, says Kenway, but mathematicians and physicists can bring a lot to the party.
The challenge is to manage the complexity, diversity and scale of digital data, but Kenway is an optimist about this new “enabling” branch of science. “Edinburgh Data Science is an exciting and innovative project,” he says. “We recognise the huge amount of disparate activity and different people involved, and their different motivations. In data science, one size will never fit all, but we are planning to connect up the whole University, creating a truly multi-disciplinary culture which, we hope, will lead to very practical results, not only for mathematicians and physicists, informatics and medicine, but also for public services and industry, and in so doing build a “smart” university.”
Professor Richard Kenway OBE
As Vice-Principal for High- Performance Computing at the University of Edinburgh, Professor Kenway is responsible for UK High-Performance Computing Services and for promoting advanced computing technology to benefit academia and industry. For ten years, until it closed in 2011, this included the National e-Science Centre. From 2008 to 2011, he was also Head of the School of Physics and Astronomy.
As a specialist in quantum chromodynamics (QCD), Professor Kenway led UK participation in an international project to build three 10-teraflop/s computers to simulate QCD, which ran from 2004 to 2011, and he is the principal investigator on a follow-on project with IBM and Columbia University, which built and is currently exploiting a 1-petaflop/s prototype BlueGene/Q computer.
In 2002, he also initiated the International Lattice Data Grid project, which provides a global infrastructure for sharing simulation data.