Alibaba Group’s newly-released large language model Qwen3 has shown higher mathematical-proving and code-writing abilities than its previous models and some American peers, putting it at the top of benchmark charts.
Qwen3 offers two mixture-of-experts (MoE) models (Qwen3-235B-A22B and Qwen3-32B-A3B) and six dense models.
A MoE, also used by OpenAI’s ChatGPT and Anthropic’s Claude, can assign a specialized “expert” model to answer questions on a specific topic. A dense model can perform a wide range of tasks, such as image classification and natural language processing, by learning complex patterns in data.
Alibaba, a Hangzhou-based company, used 36 trillion tokens to train Qwen3, doubling the number used for training the Qwen2.5 model. DeepSeek, another Hangzhou-based firm, used 14.8 trillion tokens to train its R1 model. The higher the number of tokens used, the more knowledgeable an AI model is.
At the same time, Qwen3 has a lower deployment threshold than DeepSeek V3, meaning users can deploy it at lower operating costs and with reduced energy consumption.
Qwen3-235B-A22B features 235 billion parameters but requires activating only 22 billion. DeepSeek R1 features 671 billion parameters and requires activating 37 billion. Fewer parameters mean lower operation costs.
The US stock market slumped after DeepSeek launched its R1 model on January 20. AI stock investors were shocked by DeepSeek R1’s high performance and low training costs.
Media reports said DeepSeek will unveil its R2 model in May. Some AI fans expected DeepSeek R2 to have greater reasoning ability than R1 and the ability to catch up with OpenAI o4-mini.
‘Nonsensical benchmark hacking’
Since Alibaba released Qwen3 early on the morning of April 29, AI fans have performed various tests to check its performance.
The Yangtze Evening News reported that Qwen3 scored 70.7 on LiveCodeBench v5, which tests AI models’ code-writing ability. This beat DeepSeek R1 (64.3), OpenAI o3-mini (66.3), Gemini2.5 Pro (70.4), and Grok 3 Beta (70.6).
On AIME’24, which tests AI models’ mathematical-proofing ability, Qwen3 scored 85.7, better than DeepSeek R1 (79.8), OpenAI o3-mini (79.6), and Grok 3 Beta (83.9). However, it lagged behind Gemini2.5 Pro, which scored 92.
The newspaper’s reporter found that Qwen3 fails to deal with complex reasoning tasks and lacks knowledge in some areas, resulting in “hallucinations,” a typical situation in which an AI model provides false information.
“We asked Qwen3 to write some stories in Chinese. We feel that the stories are more delicate and fluent than those written by previous AI models, but their flows and scenes are illogical,” the reporter said. “The AI model seems to be putting everything together without thinking.”
In terms of scientific reasoning, Qwen3 scored 70%, lagging behind Gemini 2.5 Pro (84%), OpenAI o3-mini (83%), Grok 3 mini (79%), and DeepSeek R1 (71%), according to Artificial Analysis, an independent AI benchmarking & analysis company.
In terms of reasoning and knowledge in humanity, Qwen3 scored 11.7%, beating Grok 3 mini (11.1%), Claude 3.7 (10.3%), and DeepSeek R1 (9.3%). However, it still lagged behind OpenAI o3-mini (20%) and Gemini 2.5 Pro (17.1%).
In February of this year, Microsoft Chief Executive Satya Nadella said that focusing on self-proclaimed milestones, such as achieving artificial general intelligence (AGI), is only a form of “nonsensical benchmark hacking.”
He said an AI model can declare victory only if it helps achieve a 10% annual growth in gross domestic product.
Chip shortage
While Chinese AI firms need more time to catch up with American players, they face a new challenge – a shortage of AI chips.
In early April, Chinese media reported that ByteDance, Alibaba, and Tencent reportedly ordered more than 100,000 H20 chips from Nvidia for 16 billion yuan (US$2.2 billion).
On April 15, Nvidia said it had been informed by the US government informed that the company would need a license to ship its H20 AI chips to China. The government cited the risk that Chinese firms would use the H20 chips in supercomputers.
The Information reported on May 2 that Nvidia had told some of its biggest Chinese customers that it is tweaking the design of its AI chips so they can continue to ship AI chips to China. A sample of the new chip will be available as early as June.
Nvidia has already tailored AI chips for the Chinese market several times. After Washington restricted the export of A100 and H100 chips to China in October 2022, Nvidia designed the A800 and H800 chips. However, the US government extended its export controls to cover them in October 2023. Then, Nvidia unveiled the H20.
Although the H20 only performs equivalent to 15% of the H100, Chinese firms are still rushing to buy it, instead of Huawei’s Ascend 910B chip, which faces a limited supply due to a low production yield.
A Chinese IT columnist said the Ascend 910B is a faster chip than the H20, but the H20’s bandwidth is ten times that of the 910B’s. He said a higher bandwidth in an AI chip, like a better gearbox in a sports car, can achieve a more stable performance.
The Application of Electronic Technique, a Chinese scientific journal, said China’s AI firms could try to use homegrown chips, such as Cambricon Technologies’ Siyuan 590, Hygon Information Technology’s DCU series, Moore Threads’ MTT S80, Biren Technology’s BR104, or Huawei’s upcoming Ascend 910C.
Read: After DeepSeek: China’s Manus – the hot new AI under the spotlight