As the social media platforms and the stock markets are buzzed with the popularity of the new AI company DeepSeek, a report by Bernstein stated that DeepSeek looks fantastic but not a miracle and not built in USD 5 million.
The report addressed the buzz around DeepSeek’s models, particularly the idea that the company built something comparable to OpenAI for just USD 5 million. According to the report, this claim is misleading and doesn’t reflect the full picture.
It stated that “we believe that DeepSeek DID NOT “build OpenAI for USD 5M”; the models look fantastic but we don’t think they are miracles; and the resulting Twitter-verse panic over the weekend seems overblown”.
The Bernstein report stated that DeepSeek has developed two main families of AI models: ‘DeepSeek-V3’ and ‘DeepSeek R1’. The V3 model is a large language model that uses a Mixture-of-Experts (MOE) architecture.
This approach combines multiple smaller models to work together, resulting in high performance while using significantly fewer computing resources compared to other large models. The V3 model has 671 billion parameters in total, with 37 billion active at any given time.
It also incorporates innovative techniques like Multi-Head Latent Attention (MHLA), which reduces memory usage, and mixed-precision training using FP8 computation, which improves efficiency.
To train the V3 model, DeepSeek used a cluster of 2,048 NVIDIA H800 GPUs for about two months, totalling approximately 2.7 million GPU hours for pre-training and 2.8 million GPU hours including post-training.
While some have estimated the cost of this training at around USD 5 million based on a USD 2 per GPU hour rental rate, the report points out that this figure doesn’t account for the extensive research, experimentation, and other costs involved in developing the model.
The second model, ‘DeepSeek R1’, builds on the V3 foundation but uses Reinforcement Learning (RL) and other techniques to significantly improve reasoning capabilities. The R1 model has been particularly impressive, performing competitively against OpenAI’s models in reasoning tasks.
However, the report noted that the additional resources required to develop R1 were likely substantial, though not quantified in the company’s research paper.
Despite the hype, the report emphasized that DeepSeek’s models are indeed impressive. The V3 model, for instance, performs as well as or better than other large models on language, coding, and math benchmarks while using only a fraction of the computing resources.
For example, pre-training V3 required about 2.7 million GPU hours, which is just 9 per cent of the compute resources needed to train some other leading models.
In conclusion, the report outlined that while DeepSeek’s achievements are remarkable, the panic and exaggerated claims about building an OpenAI competitor for USD 5 million are overblown.
(Except for the headline, this story has not been edited by The Hindkesharistaff and is published from a syndicated feed.)