Chinese AI startup DeepSeek has released what appears to be one of the most powerful open-source language models to date, trained at a cost of just $5.5 million using restricted Nvidia H800 GPUs.
The 671-billion-parameter DeepSeek V3, released this week under a permissive commercial license, outperformed both open and closed-source AI models in internal benchmarks, including Meta’s Llama 3.1 and OpenAI’s GPT-4 on coding tasks.
The model was trained on 14.8 trillion tokens of data over two months. At 1.6 times the size of Meta’s Llama 3.1, DeepSeek V3 requires substantial computing power to run at reasonable speeds.
Andrej Karpathy, former OpenAI and Tesla executive, comments: For reference, this level of capability is supposed to require clusters of closer to 16K GPUs, the ones being brought up today are more around 100K GPUs. E.g. Llama 3 405B used 30.8M GPU-hours, while DeepSeek-V3 looks to be a stronger model at only 2.8M GPU-hours (~11X less compute). If the model also passes vibe checks (e.g. LLM arena rankings are ongoing, my few quick tests went well so far) it will be a highly impressive display of research and engineering under resource constraints.
Does this mean you don’t need large GPU clusters for frontier LLMs? No but you have to ensure that you’re not wasteful with what you have, and this looks like a nice demonstration that there’s still a lot to get through with both data and algorithms.