MiniMax M3 outperforms rivals at fraction of cost

Chinese AI startup MiniMax released its highly anticipated M3 large language model on Sunday evening Eastern time, pairing frontier-tier coding and agentic performance with a 1-million-token context window and native multimodality at a fraction of the cost of leading proprietary models. Pricing starts at $20 per month under the company’s new subscription token plans.

For the next week, the model is available via the MiniMax API at a discounted rate of $0.30 per million input tokens and $1.20 per million output tokens on fresh cache. At full price — $0.60 and $2.40 per million tokens — MiniMax-M3 remains at just 8% to 20% the cost of leading U.S. models from Google, OpenAI, and Anthropic.

Traditional large language model development has long forced a choice between top-tier closed-source intelligence behind restrictive APIs or cheaper open models that struggle with multi-step reasoning and dense coding tasks. MiniMax-M3 aims to collapse that trade-off.

Sparse attention drives efficiency gains

At the core of the model’s efficiency is an architectural departure from standard Transformer networks. Normal attention mechanisms scale quadratically — computational costs explode as text inputs lengthen. The engineering team implemented a technique called MiniMax Sparse Attention, or MSA.

Think of traditional full attention as an editor reading an entire library from scratch to verify one sentence. MSA acts like an intelligent indexing clerk, using a pre-filtering phase to partition Key-Value matrices into precise blocks. The system reads each block exactly once, keeping memory access contiguous, which boosts hardware utilization.

In internal trials, MSA runs more than four times faster than alternative open-source solutions like Flash-Sparse-Attention. When managing a full 1-million-token context, per-token compute demand drops to 1/20th of the previous generation, translating to a 9x acceleration in the prefilling stage and a 15x boost during decoding.

Benchmark scores against top models

MiniMax built M3 as a natively multimodal system from the start, blending text, images, and visual components in a single pretraining corpus exceeding 100 trillion tokens. That alignment allows the model to translate complex visual geometries — programming charts or coordinate maps — into structural code.

On standardized assessments, M3 records a 59.0% on SWE-Bench Pro, an autonomous agent metric, placing it ahead of closed models like GPT-5.5 and Gemini 3.1 Pro. It achieves 66.0% on Terminal Bench 2.1, 74.2% on MCP Atlas, and 83.5% on BrowseComp — beating Claude Opus 4.7’s score of 79.3 in autonomous browsing.

However, when compared with Anthropic’s newer Claude Opus 4.8, released last week, M3’s efficient sparse-attention footprint shows its limits. On SWE-Bench Pro, M3’s 59.0% falls behind Opus 4.8’s 69.2%. On Terminal-Bench 2.1, M3’s 66.0% matches Opus 4.7’s 66.1% but trails Opus 4.8’s 74.6%. On the OSWorld-Verified sandbox for GUI interaction, M3 scores 70.0% against Opus 4.8’s 83.4%.

These results highlight a structural trade-off: closed-source systems maintain absolute leads on hyper-complex reasoning, but M3 delivers a highly capable baseline without the compounding cost of closed-door API fees.

Against fellow open-weights model DeepSeek-V4 Pro Max, M3 holds its ground. On SWE-Bench Pro, M3’s 59.0% edges past DeepSeek’s 55.4%. On Terminal Bench, DeepSeek pulls slightly ahead with 67.9% versus M3’s 66.0%. On BrowseComp the two reach virtual parity — 83.5% versus 83.4%. On MCP Atlas, M3 leads 74.2% to 73.6%.

This close alignment suggests the block-filtered sparse attention yields competitive efficiencies without requiring the massive parameter activation scaling of DeepSeek’s 1.6-trillion-parameter footprint.

Product suite and subscription pricing

MiniMax translates these architectural gains into an updated product suite. The flagship is MiniMax Code, an AI agent that uses M3’s multi-step capabilities. It runs a “Producer + Verifier” adversarial loop — one agent generates code while a second tests and reflects on outputs, allowing the system to self-correct and operate autonomously for days.

Because M3 has native visual grounding, MiniMax Code supports direct computer use. A developer could issue a cross-application voice prompt from a phone to open an ERP client and batch-populate data tables from an Excel spreadsheet.

For developers, the API introduces a toggleable thinking mode. When enabled, M3 routes processing into deep reasoning; when disabled, it runs at minimal latency for quick completions. The companion Token Plan offers three annual billing options:

Plus ($20/month): ~1.7 billion tokens per month, handles 3–4 concurrent agents.
Max ($50/month): ~5.1 billion tokens, 4–5 concurrent agents, plus 3 automated video clips per day via Hailuo 2.3.
Ultra ($120/month): ~9.8 billion tokens, 6–7 concurrent agents, 5 daily video clips.

Open-weights release and strategic implications

MiniMax pledged to release M3 under an open-weights license within 10 days, with weights and documentation on HuggingFace and GitHub. The company’s leadership said the model will include “open weights,” allowing full enterprise downloading and customizability free of charge.

But it remains unclear exactly which license will apply — whether permissive options like MIT or Apache 2.0, or the new OpenMDW license. If the license permits consumer use, the calculus for enterprise infrastructure managers shifts significantly: they can run a model that competes with GPT-5.5 and Gemini 3.1 Pro on key benchmarks at a fraction of the cost, without API lock-in.

The decision to open the weights carries strategic weight, but the competitive ceiling against models like Claude Opus 4.8 shows that closed-source systems still hold advantages on the hardest reasoning tasks. For now, MiniMax-M3 offers a middle ground — frontier capability at a price that undercuts the market leaders by an order of magnitude.

Sparse attention drives efficiency gains

Benchmark scores against top models

Product suite and subscription pricing

Open-weights release and strategic implications

Related Articles

Is CenturyLink Internet a reliable Internet service provider?

Query logs help AI agents fix SQL errors

Constructive Dismissal in Employment Law: Recognizing the Signs

AI model achieves efficient memory usage

Leave a Reply Cancel reply