Behind Li Fei-Fei's team's attempt to reduce model costs, breakthroughs in open-source, data, and technology are key elements.
Recently, it was reported that researchers from the team of Chinese-American scientist and "AI Godmother" Li Fei-Fei trained an s1 model comparable to DeepSeek-R1 for less than $50. According to sources, this s1 model was not trained from scratch but involved supervised fine-tuning based on Alibaba Cloud's Qwen model.
Li Fei-Fei's team's publicly available paper shows that after supervised fine-tuning of the Qwen2.5-32B-Instruct language model, the resulting s1-32B model outperformed o1-preview in competition math problems by up to 27% (MATH and AIME24), achieving performance comparable to top-tier reasoning models like OpenAI's o1 and DeepSeek's R1 in mathematics and coding capabilities. In this process, Li Fei-Fei's team primarily used a small dataset s1K containing 1,000 questions and their reasoning trajectories and developed budget forcing technology to extend model thinking, building a high-quality model at ultra-low cost.
How can the simplest method be used to expand model testing time (i.e., allowing AI models to think more before answering) and achieve strong reasoning performance? Behind Li Fei-Fei's team's attempt to reduce model costs, breakthroughs in open-source, data, and technology are key elements.
How Was It Achieved?
From a technical perspective, Li Fei-Fei's team demonstrated that high-quality data samples and simple test-time expansion can significantly enhance model training efficiency.
According to the public paper, the research team first constructed a dataset S1K consisting of 1,000 carefully selected questions, each paired with reasoning processes and answers distilled from Gemini Thinking Experimental. Based on this dataset, they performed supervised fine-tuning on the Qwen2.5-32B-Instruct language model, completing the model training in just 26 minutes using 16 H100 GPUs.
In fact, this dataset of only 1,000 questions is far smaller than the typical large-model training datasets in the industry. Li Fei-Fei's team demonstrated that high-quality, challenging, and diverse data can bring significant "tension." Researchers collected 59,029 questions from 16 different sources, including existing math problem datasets and self-created probability question sets and brain teasers, then filtered out poorly formatted datasets and selected questions with longer reasoning chains to create a final dataset covering 50 different domains.
In 2024, Li Fei-Fei refuted the notion that "AI models are exhausting the data available for training" during a media interview. She argued that there is no shortage of AI training data; instead, vast amounts of differentiated data remain untapped. She emphasized that high-quality data has become unprecedentedly important, and creating high-quality datasets is core to AI research.
On the other hand, Li Fei-Fei's team also developed a "budget forcing" technique during the training of the S1 model to control the computational resources spent during testing, thereby influencing the model's reasoning depth and final answers.
Simply put, this "budget forcing" involves two scenarios: if the model generates more reasoning tokens than the set limit, it forces the end of the reasoning process and appends an end-of-thinking token to prompt the model to enter the answer generation phase. If more testing-time computational resources are desired, it suppresses the generation of the end-of-thinking token and adds "Wait" to the reasoning trajectory, encouraging deeper reasoning exploration. The research team found that this method can also enable the model to re-examine its answers, typically correcting erroneous reasoning steps and improving reasoning performance.
Currently, the s1 model, along with its training data and code, has been open-sourced on GitHub. The research team hopes to inspire future research on simple reasoning.
The Rise of Open-Source Large Models
As the era of "burning money" for large models cools down, how to train high-performance models at lower costs is becoming one of the industry's focal points.
Unlike Li Fei-Fei's team's "supervised fine-tuning," when DeepSeek released DeepSeek-R1, it distilled six smaller models from DeepSeek-R1's output and open-sourced them to the community. DeepSeek stated that models distilled from Qwen-32B and Llama-70B achieved performance comparable to OpenAI's o1-mini across multiple capabilities.
An industry insider told reporters that whether it's Li Fei-Fei's team extracting essential data for supervised fine-tuning on Qwen or DeepSeek's distillation—using DeepSeek-R1 as the teacher model and Qwen as the student model to transfer the teacher model's capabilities to the student model—both approaches have achieved high-performance new models. These are two different technical routes, but both have reduced the cost of training high-performance models.
With the rise of DeepSeek and the low-cost training of the s1 model based on Qwen, the impact of open-source large models on the industry landscape is deepening. According to statistics from the open-source community HuggingFace, the number of derivative models based on Qwen in open-source communities worldwide has exceeded 90,000. In 2024 alone, the global downloads of the visual understanding models Qwen-VL and Qwen2-VL surpassed 32 million. The open-source ecosystem for large models is rapidly developing.
In the current focus on the "cost-effectiveness" of model training, open-source large models are continuously challenging closed-source models. GF Securities Research pointed out that with DeepSeek topping global download charts and being fully open-sourced based on R1, its API service pricing is much lower than OpenAI's. The overseas market generally believes that the decline in training and inference costs may lead to faster innovation, wider adoption of models, and increased inference demand. At the same time, the narrative around computational power will be affected, and the narrowing gap in performance between open-source and closed-source models may pose challenges to foundational model development companies (closed-source) because cheaper open-source options could erode market demand.
As more open-source large models develop and improvements in model training techniques and data quality continue, more players in the industry will be impacted. GF Securities also mentioned that in the future, improvements in the cost and efficiency of large models may benefit AI application companies. These companies are seeking opportunities to develop products based on LLMs (large language models) and new models, so improvements in cost and efficiency could boost these companies' capital return rates. Additionally, competition among cloud providers is accelerating attention to the ecosystem services of open-source large models like DeepSeek, competing for open-source large model computational power needs.
In this multi-path race of large model technology "democratization" and technological upgrades, more stories like those of DeepSeek and s1 are expected by the industry, bringing more rapid iteration and competitive pressure to practitioners.