China has released a cheap, open-source rival to OpenAI’s ChatGPT, and it has some scientists excited and Silicon Valley worried.
DeepSeek, the Chinese artificial intelligence (AI) lab behind the innovation, unveiled its free large language model (LLM) DeepSeek-V3 in late December 2024 and claims it was built in two months for just $5.58 million — a fraction of the time and cost required by its Silicon Valley competitors.
Following hot on its heels is an even newer model called DeepSeek-R1, released Monday (Jan. 20). In third-party benchmark tests, DeepSeek-V3 matched the capabilities of OpenAI’s GPT-4o and Anthropic’s Claude Sonnet 3.5 while outperforming others, such as Meta’s Llama 3.1 and Alibaba’s Qwen2.5, in tasks that included problem-solving, coding and math.
Now, R1 has also surpassed ChatGPT’s latest o1 model in many of the same tests. This impressive performance at a fraction of the cost of other models, its semi-open-source nature, and its training on significantly less graphics processing units (GPUs) has wowed AI experts and raised the specter of China’s AI models surpassing their U.S. counterparts.
“We should take the developments out of China very, very seriously,” Satya Nadella, the CEO of Microsoft, a strategic partner of OpenAI, said at the World Economic Forum in Davos, Switzerland, on Jan. 22..
Related: AI can now replicate itself — a milestone that has experts terrified
AI systems learn using training data taken from human input, which enables them to generate output based on the probabilities of different patterns cropping up in that training dataset.
For large language models, these data are text. For instance, OpenAI’s GPT-3.5, which was released in 2023, was trained on roughly 570GB of text data from the repository Common Crawl — which amounts to roughly 300 billion words — taken from books, online articles, Wikipedia and other webpages.
Reasoning models, such as R1 and o1, are an upgraded version of standard LLMs that use a method called “chain of thought” to backtrack and reevaluate their logic, which enables them to tackle more complex tasks with greater accuracy.
This has made reasoning models popular among scientists and engineers who are looking to integrate AI into their work.
But unlike ChatGPT’s o1, DeepSeek is an “open-weight” model that (although its training data remains proprietary) enables scientists to peer inside and modify its algorithm. Just as important is its reduced price for users — 27 times less than o1.
Besides its performance, the hype around DeepSeek comes from its cost efficiency; the model’s shoestring budget is minuscule compared with the tens of millions to hundreds of millions that rival companies spent to train its competitors.
In addition, U.S. export controls, which limit Chinese companies’ access to the best AI computing chips, forced R1’s developers to build smarter, more energy-efficient algorithms to compensate for their lack of computing power. ChatGPT reportedly needed 10,000 Nvidia GPUs to process its training data, DeepSeek engineers say they achieved similar results with just 2,000.
How much this will translate into useful scientific and technical applications, or whether DeepSeek has simply trained its model to ace benchmark tests, remains to be seen — but scientists and AI investors are watching closely.