DeepSeek: What lies under the bonnet of the new AI chatbot?

DeepSeek: What lies under the bonnet of the new AI chatbot?

Getty Images A woman looks a phone with the DeepSeek logo in the background (Credit: Getty Images)

Tumbling stock market values and bold claims have followed the release of a new AI chatbot by a small Chinese company. What makes it so unique?

The launch of China's new DeepSeek AI-powered chatbot app has shaken the tech industry. It quickly surpassed OpenAI's ChatGPT to become the most-downloaded free iOS app in the US and led to chip-making company Nvidia losing nearly $600 billion (£483 billion) in market value in a single day—a new record for the US stock market.

What's causing all this excitement? The "large language model" (LLM) powering the app has reasoning abilities similar to US models like OpenAI's, but reportedly costs much less to train and operate. DeepSeek claims they achieved this by using several technical strategies that reduced both the computation time needed to train their model (called R1) and the memory required to store it. This reduction in overheads led to significant cost savings, according to DeepSeek. R1's base model V3 reportedly took 2.788 million hours to train (using many graphical processing units – GPUs – simultaneously) at an estimated cost of under $6 million (£4.8 million), compared to the more than $100 million (£80 million) that OpenAI's CEO Sam Altman says was needed to train GPT-4.

Even though Nvidia's market value took a hit, the DeepSeek models were trained on about 2,000 Nvidia H800 GPUs, according to a research paper released by the company. These chips are a modified version of the widely used H100 chip, designed to meet export regulations to China. They were likely stockpiled before the Biden administration further tightened restrictions in October 2023, effectively banning Nvidia from exporting the H800s to China. Given these limitations, DeepSeek likely had to find innovative ways to make the most effective use of the resources available.

Reducing the computational cost of training and running models can also help address concerns about AI's environmental impact. Data centers that run these models use a lot of electricity and water, mainly to prevent servers from overheating. Most tech companies don't reveal the carbon footprint of their models, but a recent estimate suggests ChatGPT emits over 260 tonnes of carbon dioxide per month—equivalent to 260 flights from London to New York. Therefore, making AI models more efficient would be a positive step for the industry from an environmental perspective.