Silicon Valley sits on a geological fault line, so earthquakes are not unusual. But a recent shockwave originated all the way from China when DeepSeek announced R1, their new Large Language Model (LLM).
Although LLM announcements are now common, this one was different and led to a major devaluing of technology stocks. In order to understand why this happened it’s useful to know a little about LLMs, how they are created and what DeepSeek did differently.
Language models are a core part of everyday technologies such as predictive text and smart speakers. There is a good chance that you’ve already used a language model already today.
Although most people have only heard about them recently, language models are not new and their roots go all the way back to the 1950s. They are developed by looking at example documents. In their early days, the number of documents that could be used was limited by the amount of documents available in electronic format and the fact that computers were not as powerful as they are today.
But this has all changed now that huge amounts of documents are available online and computers have become more powerful. Another key step was the development of a more efficient type of computer architecture, known as “transformers”, that allowed information from multiple documents to be combined efficiently. All of this has led to rapid advances in the techniques used to create language models and the rise of the so-called “Large Language Models”.
But creating modern LLMs is an expensive business with the cost of training some of the models running up to hundreds of millions of US dollars. Why are they so expensive to create? One reason is simply that LLMs examine huge amounts of documents and that takes a lot of computing time. Another significant expense arises from the way in which the models are taught to respond to questions. People are asked to look at the model’s output to provide feedback about what they like and what they don't. But paying people to look at this output is expensive and waiting for them to provide their answers takes time. All of this means that LLMs could only be developed by large organisations with deep pockets.
Until, that is, DeepSeek announced their R1 model which they claimed could be trained for a fraction of the cost of competitor’s models without compromising performance. And that’s what sent the shockwaves through the existing LLM producers. If LLMs could be created more cheaply than they could then they would have more competition, and presumably lower profits. Investors were also surprised that such a good model could be produced within China where the high-performance computer chips that existing LLM producers rely on to create their models are not available due to trade restrictions.
So what did DeepSeek do differently? Their approach has been to apply multiple optimisation techniques to the training process which combine to produce substantial gains. One is to reduce the number of digits used to represent values when precision is not needed. Another is to base R1 on a set of subsystems each of which is designed to carry out a specific task. This approach, known as “mixture of experts”, is that it is often quicker to train and apply than a single large system. Also, R1 does not rely on human judgements but, instead, replaces them with other language models to avoid the time and expense involved in employing people.
However, there have been some reports that R1 might not be as cheap to train as has been claimed. OpenAI, the creator of the popular GPT family of LLMS, has also claimed that DeepSeek extracted knowledge from their models by repeatedly questioning them. This approach, known as “knowledge distillation”, is quite common in the development of LLMs but raises the question about the degree to which the models developed by DeepSeek are truly novel compared to existing ones.
It will probably take time for definitive answers to these questions to emerge but, whatever the answers, DeepSeek has shaken the LLM world and advanced the goal of reducing the cost of training LLMs.
Dr Mark Stevenson is a senior lecturer in Computer Science at The University of Sheffield, UK.
The opinions expressed in this article are those of the author and do not purport to reflect the opinions or views of THE WEEK.