Can supercomputers get any more super?
Data should be the first thing on people's minds…
Can supercomputers get any more super?
Drawing on his background as the former Director of one of the most impressive supercomputing facilities in Europe, Professor Arthur Trew describes the latest trends in computational science – and sounds a note of caution on what we should expect from the next generation of giant number-crunching machines...
In 50 years, the world's fastest supercomputer has gone from three megaflops (three million floating-point operations per second) to over 30 petaflops, or about ten billion times faster. And such mind-blowing progress continues, with plans in the pipeline for supercomputers at least ten times faster within the next couple of years, followed by a giant machine which breaks the exaflop barrier – 1,000 petaflops (one quadrillion floating-point operations per second).
While exaflop computers enable us to target new computational problems, they also bring a range of problems we have never seen before, says Professor Arthur Trew. For example, even using today’s most power-efficient technologies, such a monster computer would need its own power plant, the size of Longannet (the largest in Scotland), to run.
To see this “progress” from a different angle, Trew says that his mobile phone uses a faster processor and has a bigger memory than the first supercomputer he used 30 years ago.
Trew, who is now Head of the School of Physics and Astronomy, and Assistant Principal in Computational Science, at the University of Edinburgh, advises the UK Research Councils on the operation of the facilities at EPCC (the Edinburgh Parallel Computing Centre), in particular ARCHER, the UK’s flagship supercomputer, and the UK’s Research Data Facility (UKRDF). This means he maintains a strong interest in the performance of EPCC, as well as in trends in supercomputing, with a perspective dating back to before EPCC was established, when he was a Research Fellow in the Department of Physics and Astronomy, focusing on many-body problems in cosmology and other complex systems.
Hardware developers face enormous challenges in building the next generation of supercomputers, but Trew himself is more concerned with how to use them and squeeze the most out of the hardware. Increasingly, that means bringing together data from multiple sources. Number-crunching capabilities may grab the headlines, but it is often easier to do the calculations than cope with the mountains of data which grow so dramatically year after year.
When HECTOR, the predecessor to ARCHER, was installed at EPCC in 2007 at a cost of £113 million, it was hailed as a breakthrough for supercomputing in Scotland, capable of 60 million million calculations a second. But when it was unplugged last year, hardly anyone noticed, says Trew. Not only was this because ARCHER was three to four times faster than the upgraded version of HECTOR, but also because UKRDF was used as a data backbone so that the work of both facilities could transfer seamlessly. “We have to recognise there's been a fundamental change in computing,” says Trew, “in terms of strategic and structural issues. Computers come and go but the data persists.”
For example, he continues, a supercomputer such as ARCHER may generate the data, but you can use other systems for the visualisation. “There are repercussions, however,” he adds, “because data has what we call weight – for the large quantities used in many simulations, it’s quicker to move it around on a truck than it is to transmit it over the Internet. There’s an imbalance between our ability to store data and to transmit it. So the idea of a data backbone is a technical solution to a fundamental problem, not just for academic researchers but also for large organisations such as banks and retailers. Data should be the first thing on people's minds.”
Another big concern for Trew is how to get the skills we need to develop the new tools and techniques to turn the data into useful information. “If there is no co-operation between those who develop the analysis tools and those who use them, then we could be developing tools in the dark. That is why we must bring groups together, to demonstrate requirements and the ability to solve them.” It’s critical to bridge the gap between the people with the need for applications and those who can deliver the technology to address them – what Trew describes as a “virtuous circle” which enables different strands of research to make progress together.
Despite his close involvement with supercomputers, Trew is keen to emphasise that he is much more interested in the science enabled by supercomputers: “The computers are a stepping stone – just like test tubes or other laboratory apparatus.”
Trew has always been clear that the real challenge is to make computational models more realistic, and this raises the difficult issue of verification – to ensure you get the “right” results from simulation. Mathematicians can develop very clever algorithms, says Trew, but when you compare real data with the simulated data, you don’t always get the same answer. This can be due to using the wrong algorithm or because the model does not capture all of the processes. For example, you can simulate the air flow over a new design of aeroplane wing with calculations that are close to correct, but when you then go on to analyse the stresses on the structure, and deformations, then add the engines and the fuselage, the model becomes highly complex and coupled. Evolutionary biology can also present very challenging problems because it spans the range from chemical models, through simple biological system to organs, individuals and populations – a spiralling complexity which will long challenge the most powerful of computers.
What interests Trew is the idea that “while there is no end in sight to the problems amenable to simulation, and Moore’s Law* still has some way to run, it is not obvious that today’s digital approach is the only one.” Over the last 50 years, the hardware has got faster as more and more transistors have been squeezed onto chips, then parallel computing made the chips more efficient by dividing the task into smaller components, but this has been at the expense of ease of use.
*Moore's Law is the prediction that the number of transistors in integrated circuits (or computer chips) will double approximately every two years.
“Moreover,” says Trew, “the more we reduce the size of the transistors on a microprocessor, the more they will act like quantum devices, not knowing whether the answer is a one or a zero.” But Trew still believes we need to aim for exaflop computers, adding: “I am not clear if we'll get there before 2020, but I'm clear we will usefully get there.”
Another interesting approach is to consider new methods of computation. Trew illustrates this by describing his reaction to the launch of the world's biggest supercomputer, IBM's BlueGene/P, in 2007. Filling a room the size of two tennis courts, the new machine could simulate the brain of a fly, at a tenth of the speed. Clearly, nature was much more efficient, thought Trew at the time, but how could we replicate that approach? And could it be generalised to address other problems?
For Trew, there are also methodological questions to answer: “There is a feeling that what comes out of computers must be right, but that isn’t true. We’ve had theoretical science for 3,000 years and done experiments for 500 years, and have developed robust techniques for verifying whether our theories and experiments are correct. By contrast, we are still at the early stages of simulation – it is not obvious we know how to verify models. We need more computational rigour.”
Some systems are “inherently incalculable,” says Trew, citing the examples of the climate and weather: “Because the weather can be influenced by the smallest perturbations, we have to have different approaches to understanding how confident we can be in the simulations.” Faster computers have enabled dramatic improvements in forecasts because they can enable more complete and detailed models, but they also use ensemble modelling to estimate the likelihood of the predictions being correct.
Supercomputers today are also becoming much better at helping with subjects that once were considered beyond the capabilities of any computer, such as social sciences. In the early 1990s, Trew was involved in a project which set out to analyse housing, but he quickly discovered that they simply did not have the power to deal with the dynamic, multi-faceted data involved.
Today, says Trew, there are still many challenges in supercomputing, including the problem of power consumption. But rather than depend on the hardware and software to solve all their problems, some companies will have to change their business models if they want to survive in the age of big data. .
Supercomputing in action
One extreme to another
The European Centre for Medium-Range Weather Forecasts (ECMWF) is using ARCHER to improve the performance of its forecasting model.
The ARCHER National UK Supercomputing Service has enabled the Aircraft Research Association to perform its largest-ever computational fluid dynamics (CFD) simulation, validating the results of its wind-tunnel data and paving the way for increased use of CFD in landing-gear assembly design and improved, more environmentally friendly designs.
Albatern develops innovative offshore marine renewable energy devices. High-performance computing (HPC) enabled the company to simulate a large-scale Wavenet array (100 or more devices), to “parallelise” its energy devices, building on Albatern’s in-house modelling expertise. Computer visualisation and power prediction of large-scale arrays are also vital to the success of Albatern’s efforts to continue investment.
Code_Saturne is a multi-purpose CFD application used for nuclear reactors. Scientists from EDF Energy and STFC Daresbury Laboratory recently tested the performance of Code_Saturne on ARCHER as part of an ongoing CFD code improvement effort and in preparation for more extensive production runs.
The rise of the supercomputer
> The first supercomputer, the Control Data Corporation (CDC) 6600, only had one CPU. Launched in 1964, the CDC 6600 cost US $8 million, and ran at up to 40MHz, doing three million floating-point operations per second.
> IBM's $120 million Roadrunner, which was the first supercomputer to achieve one petaflop (one quadrillion floating-point operations per second), was the fastest in the world in 2009. It was shut down only five years after its launch.
> China is currently top of the supercomputer league table, with its Tianhe-2 rated at about 34 petaflops.
> US officials have announced plans to develop a new supercomputer which may eventually be able to operate at up to 300 petaflops, and the next target is to build exaflop supercomputers (one exaflop = 1,000 petaflops).
> In 1964, the CDC 6600 cost the equivalent of US $8.3 trillion per Gflop (one billion floating-point operations per second). The going rate today is eight US cents per Gflop.