Supercomputers are built to solve very large, difficult problems and do it quickly. Instead of relying on a single processor, supercomputers like El Capitan at Lawrence Livermore National Laboratory and Frontier at Oak Ridge National Laboratory use a large number of processors working together simultaneously. That makes them especially useful for jobs like climate modeling, genetic research, nuclear simulations, artificial intelligence, and identifying flaws in jet engine design.
We’re not talking about quantum computers here, though. A supercomputer is still a classical computer: it uses ordinary bits, which are either 0 or 1, and it solves problems by doing massive numbers of conventional calculations very quickly. A quantum computer works differently by using quantum bits, or qubits. Quantum computing is still largely in the experimental and early developmental stage. Right now, the real work is being done by classical supercomputers, helping scientists explore problems that would take ordinary computers far too long to solve. Some of today’s fastest machines can perform more than a billion calculations per second.
Even so, supercomputers are not all-powerful. Their biggest limitations usually come down to four things: workload scaling, data transfer issues, power consumption, and reliability. Engineers are making progress on all four, but none of these problems has disappeared.
Supercomputers work best when they can break tasks into chunks
One of the biggest limitations is that supercomputers are only useful for certain kinds of tasks. They are best at problems that can be broken into many smaller pieces and worked on concurrently. This is known as parallel processing; for example, a climate model can split the atmosphere and oceans into many sections and calculate each one in parallel. But some problems do not work that way. Some tasks have steps that must happen sequentially. When that happens, a supercomputer cannot speed things up very much. If part of a job has to wait for another task to be finished, the whole system slows down. The answer here often isn’t to add more hardware. Instead, it’s to redesign the software so more of the work can happen simultaneously.Â
Another major limitation involves the process of moving data around. A supercomputer may be able to calculate incredibly quickly, but it still needs to fetch information from memory. In many cases, the machine is not limited by calculation speed, but by the time it takes to move data from one place to another. To mitigate this challenge, supercomputers store data physically closer to the processors to move it more efficiently. Researchers are also redesigning programs to reuse data more effectively instead of constantly fetching it.
Supercomputers use a lot of power and have a lot of parts that can go wrong
Power use is also a huge limitation. The fastest supercomputers use enormous amounts of electricity. They also need advanced cooling systems to prevent overheating. This creates two problems. First, it makes supercomputers very expensive to run. Second, it raises environmental concerns, especially as people push back on the large data centers needed to house them. Building better supercomputers will depend not only on making them more powerful, but also on making them more energy-efficient.
Another problem is reliability. A supercomputer contains an enormous number of parts: processors, memory units, cables, storage systems, cooling equipment, and more. The more parts a machine has, the more chances there are for something to go wrong. A loose cable, faulty memory chip, or cooling issue can interrupt a major calculation. This matters because some scientific jobs run for hours or days. If something fails midway through, that work may need to be restarted or recovered from a saved checkpoint. Engineers employ tools like the Lawrence Livermore National Laboratory’s Scalable Checkpoint/Restart (SCR) to minimize the amount of work lost when an issue occurs, but there’s no way to fully prevent hardware issues from occurring. After all, building a massive machine also means there are a massive number of things that can break.