How computing bottlenecks could throttle the UK’s generative AI sector

generative AI computing bottlenecks

It’s hard to escape ChatGPT’s prevalence in the AI computing world. While much has been written about its capabilities, there has been much less insight into its technical makeup and whether the UK has the technological means to exploit the new technology and support the expected growth requirements.

To aid this growth, the UK government’s recent £900m investment into supercomputing is of course good news and a positive step in making sure the UK can keep up with the current ChatGPT revolution. But the current top three supercomputers are from the US, Japan and Europe. The UK needs to be front and centre when it comes to computing power. Ultimately, that is what drives significant scientific and technological breakthroughs.

However, just building a machine for ‘BritGPT’ is not enough, simply because AI models are evolving very quickly, and the compute power that they require is increasing far faster than traditional computing methods can keep up with. This has been widely recognised by the high-performance computing community and the UK government needs to be more visionary and provide more support for post-Moore solutions to lead the next computing revolution.

The rise of generative AI has been so pronounced that the government has also recently invested £100m into a new expert taskforce, which has “responsibility for accelerating the UK’s capability”. It’s a positive move towards making the country a world leader in the next wave of AI, but the journey won’t be smooth.

Some of the roadblocks to the UK fulfilling this ambition are the financial power and advancing technologies present in other countries. Across the pond, PwC US has just announced plans to invest $1bn into AI, following on from significant investments from the likes of Microsoft and Alphabet. It’s a fierce global market to contend with, and in 2023, valuations for generative AI startups are already much higher than they have ever been.

The UK has by far the most generative AI startups in Europe and is primed for success. When it comes to considering whether the industry has the capacity to maintain this momentum, the major limitation is hardware bottlenecks, and the UK is no exception.

Growing demand for computing power

At its core, ChatGPT — together with its latest successor GPT-4 — is a large language model (LLM) that uses ‘transformer’ architecture.

These models are growing far faster than digital electronics can keep up with. Over the past few years, we have seen that LLM size is increasing at a rate of 240x every two years. The current version of ChatGPT needs to be trained with more than 150 ‘GPU years’, or equivalently 54,000 GPUs in a day. In addition, Microsoft has invested hundreds of millions to build the supercomputer for ChatGPT. What if we need 10 times more, or 100 times more, computing power?

This surging increase in LLM size and costs is bringing key hardware bottlenecks to the fore.

Electronic bottlenecks

There are three major generative AI bottlenecks facing current computing hardware: computing speed, power consumption and memory bandwidth.

When it comes to computing speed, for half a century the world has been able to produce more and more powerful computers following Moore’s law, which predicted double the transistor density (the semiconductors used to amplify power) on integrated circuits roughly every 18 months. However, this empirical law has started to decline as transistors approach their physical limit and problems such as current leakage and ‘quantum tunnelling’ arise.

Power consumption is a more severe limitation of digital electronics. As of today, more than 2% of the world’s electrical energy is used by data centres — and this number keeps growing.  This is because advanced processors typically consume about 1 pico-joule (pJ) per operation, and today’s computation tasks often use several hundred trillion operations or more. For example, training ChatGPT requires 3E23 operations, consuming over 300 giga-Joule. What’s worse, external DRAM memory consumes about 100 pJ per 32-bit data access, 100 times more than the operation itself. Therefore, huge efforts must also be taken to design computing architectures to minimise DRAM access as much as possible.

A third major limitation of electronics is the bandwidth. Most modern computing processors adopt the Von Neumann architecture, where data is stored in the memory and computation — essentially, the calculating — takes place in a computing core. Therefore, data needs to be moved back and forth in between.

This worked well in the past. Yet, as computing speed increases substantially, the data movement has led to a significant bandwidth bottleneck. This means more time is spent waiting for the data to be fetched rather than performing the computation. As a comparison, while hardware computing speed increased by a factor of 90,000 in the past 20 years, memory bandwidth only increased by a factor of 30.

Overall, these hardware bottlenecks are limiting the progress of AI and especially generative models, or LLMs. We are in dire need of a new computing paradigm to support the next generation of AI.

The solutions to generative AI bottlenecks

New technologies are taking on these hardware bottlenecks. Parallel storage systems, for example, address the bandwidth bottleneck by supplying the machine with a continuous stream of data (which regular storage systems struggle with). Its key asset is that it doesn’t only forward significant amounts of data rapidly, but also feeds it through several channels so it can reach other parts of the system simultaneously.

Then there is automated machine learning, which leverages AI to train models and fine-tune their algorithms. This helps with its real-world application and overcoming each of the three bottlenecks encountered in this process. For example, the algorithms can be auto-trained to be power efficient, and pruned to reduce memory requirements and data access. Over the next few years, the automation of techniques for AI optimisation is expected “to achieve 15-30x performance improvements”.

And finally, there’s optical computing. Optics have already transformed our lives with the internet, communicating vast amounts of data globally using optical fibre networks. What if we now take another step and build an optical computer?

Leveraging light to perform the bulk of AI computation offers high computing speeds with benefits from two distinct aspects: a much faster clock, and the potential for almost infinite amounts of parallelism. As already demonstrated by academic research groups, optical computing can be performed with less than one photon per operation, making the optical energy almost negligible.

As technology continues to advance and the size of the matrix underlying computation increases, the energy efficiency of the optical system increases too. Therefore, optical computing could indeed be used to totally revolutionise the future of AI with speed-of-light computation and high energy efficiency.

As LLMs grow and are trained with bigger and better datasets, their performance improves steadily. We have every reason to expect greater intelligence to emerge from even larger models.

If the UK can use its new investment in AI to unblock these bottlenecks, it can not only keep pace with the growth of generative AI, but help lead it.

Xianxin Guo is the co-founder and head of research, at Lumai.