
Alibaba this week released Qwen3.7-Plus, its latest large language model that can process text, video, and imagery at a cost of $0.40 per million input tokens and $1.60 per million output tokens. The model marks a sharp break from the company’s previous open-source strategy — it is only available under a closed commercial license through proprietary APIs and Alibaba’s Qwen Chat platform.
That shift disappointed many enterprises, including U.S. companies like Airbnb, that had relied on earlier open-weight Qwen models. Previous releases under the Qwen family were distributed under permissive licenses such as Apache 2.0, allowing organizations to download and host the weights locally. Qwen3.7-Plus removes that option entirely.
Still, the model’s low price and multimodal capabilities make it worth a close look. It is among the cheapest powerful AI models available, priced just above Chinese rival MiniMax-M3‘s limited-time discount. Unlike the text-only Qwen3.7-Max released weeks earlier, this version can analyze video, interpret screenshots, and generate enterprise-grade visuals.
For developers building autonomous agents, the main bottleneck has rarely been raw intelligence. Instead, it is state decay — the tendency of an agent framework to lose its analytical trajectory over long, multi-step tasks. It addresses that vulnerability through a combined approach to context management and reasoning state preservation.
It comes with a 1-million token context window and allocates up to 256,000 tokens specifically for internal chain-of-thought processing. To put that in perspective, an automated cloud migration agent could ingest an entire codebase, map dependencies, and spend thousands of tokens evaluating edge cases before executing a single command.
Alibaba exposes a parameter called preserve_thinking that keeps internal thinking blocks intact across conversational turns. The feature was introduced with the prior Qwen3.6 generation and is now available across both open-weight and proprietary models. It prevents the model from dropping context or recomputing cached history midway through an operation.
This idea isn’t unique to Alibaba.
Anthropic calls it “Extended Thinking” for models like Claude Opus 4.8. OpenAI uses an encrypted reasoning pass-back mechanism for GPT-5.5. The underlying concept has become table stakes for modern multi-turn reasoning.
Related: Microsoft Unveils AI Dev Box for Locals
On Terminal Bench 2.0-Terminus, which measures safe iterative code execution, this system scored 70.3, beating DeepSeek-V4-Pro Max (67.9) and Gemini-3.1 Pro (63.5).
On the ScreenSpot Pro computer vision benchmark, this system hit 79.0.
It outpaced GPT-5.4 (67.4) and Claude-Opus-4.6 (49.5).
Despite these gains, this system still falls below leading U.S. proprietary models such as Anthropic’s Claude Opus 4.6 and OpenAI’s GPT-5.4 on several raw capability metrics.
It is designed as a cost-efficient alternative.
It is not a universal replacement for frontier models.
For enterprise architects, the key question is what this model replaces in an existing tech stack. It is meant for high-frequency developer tasks, robotic process automation, and data engineering pipelines. Rather than deploying an expensive flagship model for repetitive operations, teams can route those tasks here.
The company structured its API to be fully OpenAI-compatible.
So swapping dependencies requires minimal infrastructure changes. Engineers can run the model directly through local terminal setups by altering base environment targets.
Related: Query logs help AI agents fix SQL errors
The cost advantage becomes clearer with caching. Standard input processing is $0.40 per million tokens, but if an agent reads from an explicitly created cache — such as a static code repository or enterprise UI kit — the price drops to $0.04 per million tokens for subsequent reads. That makes high-frequency, multi-turn agent iterations economically practical at scale.
But the licensing terms create compliance concerns.
Organizations cannot download or sandbox the model weights. All data must pass through Alibaba Cloud’s international endpoints, such as the Singapore instance highlighted in developer documentation. Companies under strict data-sovereignty rules — healthcare providers subject to HIPAA, defense contractors — must evaluate whether external API routing meets their obligations.
On the other hand, a managed API removes the internal burden of provisioning and maintaining multi-GPU clusters. For many teams, that trade-off is acceptable.
Prominent venture capitalist @Boxmining noted the strategic cost advantage.
“Qwen 3.7 Plus being 40% cheaper than Max changes the conversation. The output is close enough for most coding and much stronger for visual workflows, so you only need Max for heavy terminal-only jobs.”
Dunjie Lu, a research intern at Alibaba Qwen, said it shows “clear gains over Qwen3.6-Plus in computer-use capabilities, with stronger generalization beyond general desktop tasks into professional workflows such as data engineering and scientific research.”
For enterprise buyers planning their next infrastructure roadmap, this system presents a practical alternative. If the goal is building resilient, visual-capable autonomous loops that interact directly with developer environments and cloud consoles — without blowing out the inference budget — it offers a compelling reason to shift execution away from more expensive frontier options.
Leave a Reply