No Claude Fable 5? No problem: Sakana achieves frontier performance with new Fugu multi-model, auto synthesis system

Last night, the increasingly enterprise-focused AI startup Sakana launched Fugu, a multi-agent orchestration system that delivers frontier-level AI performance through a single, OpenAI-compatible API.
Designed for developers, enterprises, and nations seeking resilience against vendor lock-in and geopolitical export controls, Fugu (Japanese for "pufferfish"), bypasses the traditional monolithic model structure by dynamically routing queries to a swappable pool of specialized AI agents.
Sakana CEO and co-founder David Ha, formerly of Google Brain, positioned Fugu as a more reliable option for enterprise workflows than any single AI model provider in the wake of Anthropic's move on June 12 to revoke public access to its most powerful models, Claude Mythos 5 and Claude Fable 5, in the wake of a U.S. government export control order. As Ha wrote in a post today on X:
"Fugu dynamically orchestrates the world’s best models to tackle complex tasks. We are proving that a well-orchestrated pool of swappable agents can match restricted frontier models like Fable and Mythos. But Fugu is about more than just performance. I believe that Orchestration Models are the next frontier, beyond bigger models. Relying on a single company’s model for national infrastructure is a massive risk. As recent export controls have shown, access to top models can disappear overnight. Collective intelligence is the practical hedge against this concentration of power. Fugu simply routes around vendor restrictions by relying on an entirely swappable agent pool."
Sakana AI explicitly states that the specific models Fugu selects and how it coordinates them are proprietary, meaning this routing information is hidden from the user by design. The documentation only refers generally to a "diverse pool of powerful models," "multiple LLMs," or "specialized models" without providing a specific count.
By acting as a sophisticated coordinator rather than a standalone foundation model, Fugu matches the output quality of top-tier models like Fable and Mythos on third-party benchmarks of agentic tasks, while fundamentally altering how developers deploy critical AI infrastructure.
How Sakana Fugu works and where it beats Anthropic's Claude Fable 5
At its core, Sakana Fugu operates like a master general contractor. When presented with a complex request, Fugu does not attempt to execute every step itself.
Instead, it breaks the problem down, delegates sub-tasks to a pool of expert foundation models, verifies their work, and synthesizes the final output.
"Fugu is itself an LLM, trained to call various LLMs in an agent pool, including instances of itself recursively," the Sakana AI team noted in their technical release.
Grounded in two of Sakana's 2026 research papers, TRINITY and the Conductor, the system autonomously manages the entire lifecycle of model selection and verification using learned coordination strategies rather than hand-designed workflows. To the end user, this multi-agent swarm is entirely abstracted behind a standard API endpoint.
Sakana AI is offering two variants of the system to cater to different operational workloads:
Fugu: A high-speed, low-latency model optimized for everyday tasks. It is designed to act as the default engine for interactive chatbots and integrates directly into coding environments like Codex.
Fugu Ultra: The flagship tier engineered for complex, high-stakes tasks such as AI research, cybersecurity analysis, and multi-step patent investigations. According to Sakana, Fugu Ultra coordinates a deeper pool of experts and matches industry-leading monolithic models across rigorous scientific and reasoning benchmarks.
Additionally, on the pay-as-you-go plan, standard Fugu charges a dynamic rate based on the specific underlying models activated, whereas Fugu Ultra utilizes a fixed pricing structure starting at $5 per million input tokens and $30 per million output tokens.
As indicated by benchmark charts shared by Sakana, Fugu actually exceeds the performance of Anthropic's Claude Fable 5 on LiveCodeBench, an open source benchmark testing coding performance on regularly refreshed, software problem-solving tasks (Fugu Ultra: 93.2, Fugu: 92.9, Fable: 89.8), and beats the prior Claude Mythos Preview model on GPQA-D (Diamond) , a test of 198 graduate-level multiple-choice questions in biology, physics, and chemistry (Fugu Ultra: 95.5, Fugu: 95.5, Mythos Preview: 94.6).
By orchestrating multiple models from different providers, Fugu essentially builds native redundancy into the AI stack. If one provider suffers an outage or faces sudden regulatory restrictions, Fugu routes around the disruption to maintain uptime.
Licensing and availability
Fugu is offered as a commercial, proprietary API service, not an open-source framework.
Because Sakana’s core intellectual property lies in its non-obvious collaboration patterns, the specific routing information—meaning exactly which underlying models Fugu selects for a given query—remains proprietary and is intentionally hidden from the user.
However, Sakana offers critical controls for enterprise data compliance. Developers can explicitly opt specific models or providers out of their Fugu routing pool to maintain strict corporate privacy standards.
Additionally, users can opt out of having their prompts used for future training data. Geographically, Fugu is restricted from operating within the European Union (EU) and European Economic Area (EEA) while Sakana works to align its black-box data routing architecture with GDPR regulations.
Pricing is fairly steep
Fugu is available immediately in most regions—with the temporary exception of the EU and EEA—at subscription tiers and pay-as-you-go pricing.
Teams can opt for monthly subscription allowances designed for individual or hands-on use: a Standard tier at $20/month for lightweight workflows, a Pro tier at $100/month providing 10x standard usage, and a Max tier at $200/month offering 20x usage for continuous, long-running tasks. I wasn't able to find the actual amount of tokens covered under these plans, but I've reached out to Ha on X for more information.
As part of the initial rollout, Sakana is offering a free second month for users who subscribe to any tier by July 31, 2026.
For enterprise scaling and production deployments, Sakana offers an elastic pay-as-you-go plan. Crucially for high-stakes environments, requests made under this consumption-based model are served at a higher priority than those from monthly subscription plans.
Under this framework, the standard Fugu engine charges the single rate of the highest-tier underlying model involved in a query, without ever stacking multi-agent fees. The flagship Fugu Ultra tier (fugu-ultra-20260615) utilizes a fixed pricing structure per one million tokens: $5 for input, $30 for output, and $0.50 for cached input. These rates increase to $10, $45, and $1.00 respectively for extreme workloads utilizing context windows above 272K tokens. That puts it among the more expensive options compared to single AI models via provider APIs:
VentureBeat Frontier AI Model API Pricing Snapshot
Model | Input | Output | Total Cost | Source |
MiMo-V2.5 Flash | $0.10 | $0.30 | $0.40 | Xiaomi MiMo |
deepseek-v4-flash | $0.14 | $0.28 | $0.42 | DeepSeek |
deepseek-v4-pro | $0.435 | $0.87 | $1.305 | DeepSeek |
MiniMax-M3 | $0.30 | $1.20 | $1.50 | MiniMax |
Gemini 3.1 Flash-Lite | $0.25 | $1.50 | $1.75 | |
Qwen3.7-Plus | $0.40 | $1.60 | $2.00 | Alibaba Cloud |
MiMo-V2.5 | $0.40 | $2.00 | $2.40 | Xiaomi MiMo |
Grok 4.3 (low context) | $1.25 | $2.50 | $3.75 | xAI |
MiMo-V2.5 Pro (≤256K) | $1.00 | $3.00 | $4.00 | Xiaomi MiMo |
Kimi-K2.6 | $0.95 | $4.00 | $4.95 | Moonshot |
GLM-5.2 | $1.40 | $4.40 | $5.80 | Z.ai |
Grok 4.3 (high context) | $2.50 | $5.00 | $7.50 | xAI |
MiMo-V2.5 Pro (>256K) | $2.00 | $6.00 | $8.00 | Xiaomi MiMo |
Qwen3.7-Max | $2.50 | $7.50 | $10.00 | Alibaba Cloud |
Gemini 3.5 Flash | $1.50 | $9.00 | $10.50 | |
Gemini 3.1 Pro Preview (≤200K) | $2.00 | $12.00 | $14.00 | |
GPT-5.4 | $2.50 | $15.00 | $17.50 | OpenAI |
Gemini 3.1 Pro Preview (>200K) | $4.00 | $18.00 | $22.00 | |
Claude Opus 4.8 | $5.00 | $25.00 | $30.00 | Anthropic |
GPT-5.5 | $5.00 | $30.00 | $35.00 | OpenAI |
Sakana Fugu Ultra | $5.00 | $30.00 | $35.00 | Sakana AI |
Claude Fable 5 / Claude Mythos 5 | $10.00 | $50.00 | $60.00 | Anthropic |
Developers modeling operational costs should also note a significant architectural caveat in how Fugu bills for its multi-agent capabilities. According to the developer documentation, Fugu Ultra’s API responses include detailed usage fields that separate user-visible token generation from internal orchestration work. The background tokens consumed and generated when Fugu delegates sub-tasks, verifies code, or routes between underlying agents are not absorbed by the provider; they represent real token usage and are counted toward the final price of the request at standard rates.
The Orchestration landscape: Fugu vs. The Field and notable benchmark performance
To understand Fugu’s position in the mid-2026 AI ecosystem, it is critical to distinguish between model routing and multi-agent orchestration.
Over the past year, enterprise adoption of standard routing platforms—such as Not Diamond, Martian, and the open-source RouteLLM framework—has skyrocketed. These systems act as intelligent air traffic controllers; using semantic classifiers or meta-models, they analyze an incoming prompt and predict which single foundation model will yield the highest quality or most cost-effective response, dispatching the query accordingly.
Fugu operates on a fundamentally different paradigm. Rather than making a one-shot routing decision, Fugu aligns more closely with complex multi-round systems like Router-R1 (a framework introduced at NeurIPS 2025). It breaks a query down, interleaves reasoning with delegation, and dynamically assigns sub-tasks to multiple models in parallel or sequence before synthesizing a final output.
While frameworks like LangGraph, CrewAI, and Microsoft AutoGen offer developers the tools to build similar multi-agent systems, they require immense manual configuration—defining roles, setting up conditional edges, and managing state across long-running loops.
Fugu abstracts this operational overhead entirely. It is essentially a LangGraph-style workflow packaged as a single, black-box API endpoint.
An orchestration system is ultimately bounded by the raw capabilities of the underlying models in its pool, a reality reflected in Sakana’s own benchmark testing against standalone frontier models.
On rigorous coding and agentic tasks, collective intelligence shows a distinct advantage over standard models. Fugu Ultra posted a 73.7 on SWE-Bench Pro, significantly outperforming Anthropic's Claude Opus 4.8 (69.2) and OpenAI's GPT-5.5 (58.6).
However, Fugu is not a silver bullet, and its performance is not a clean sweep across the board. When compared to highly specialized or restricted-access monolithic models, Fugu occasionally trails:
SWE-Bench Pro: While Fugu Ultra (73.7) beat most accessible models, it was comfortably eclipsed by Anthropic’s limited-access Fable 5 (80.0), which is currently absent from Fugu's swappable pool due to the U.S. government's export control order and Anthropic's subsequent response to remove the model entirely from global usage.
Humanity's Last Exam: Fugu Ultra (50.0) narrowly edged out Opus 4.8 (49.8), but again fell short of Fable 5 (53.3).
Long-Context and Security: On the MRCRv2 long-context-recall test, OpenAI's GPT-5.5 maintained the lead (94.8 vs Fugu Ultra's 93.6), and Opus 4.8 remained the top performer on the CTI-REALM cybersecurity benchmark (69.6 vs Fugu Ultra's 69.4).
The quantitative data points to a clear conclusion: Fugu is highly effective at boosting performance on messy, multi-step tasks (like writing a complex HTML5 game from scratch) by leaning on the combined strengths of multiple mid-tier and high-tier models.
However, for sheer brute-force reasoning within a single, highly constrained domain, the industry's largest standalone models still hold the edge—provided an enterprise can maintain uninterrupted access to them.
Background on Sakana's formation and noteworthy achievements to date
Sakana AI was formed in Tokyo in 2023 by Llion Jones, a co-author of Google’s foundational 2017 "Attention Is All You Need" paper, and David Ha, the former head of research at Stability AI.
Disillusioned by large tech company bureaucracy and the industry's hyper-fixation on scaling single, massive foundational models, the founders built Sakana around principles of biomimicry and evolutionary computing.
The company's name, derived from the Japanese word for fish, reflects its core technical thesis: utilizing collective "swarm" intelligence rather than brute-force compute. Following a $2.6 billion Series B valuation in late 2025 and the recent June 2026 launch of Marlin—an autonomous, eight-hour research agent for the B2B sector—Fugu represents the commercialization of Sakana's multi-agent routing technology for everyday developers.
A mixed reception among the broader AI community online
The developer community has responded to Fugu by rigorously testing its practical tradeoffs, weighing its routing efficiencies against the sheer power of monolithic foundation models.
AI observer, developer and influencer Chris (@ChrissGPT on X) highlighted the specific utility of Fugu over raw foundational AI.
"For a single clean prompt, you probably would [use Fable 5, Mythos, or GPT-5.5 directly]," he noted, but argued that Fugu's true value emerges in messy, multi-step environments. "...whether it involves delegation, verification, synthesis, code review, research loops, security analysis... the more it would make sense to use this," he wrote.
Chris also pointed out the strategic geopolitical advantage of Fugu's architecture, noting that if frontier AI access is abruptly revoked due to regulation or export controls, an orchestrator can dynamically swap models to prevent a total system failure.
Creative agency owner Mark Santos (@markksantos) of Mark Studios provided a direct, real-world comparison by tasking both Fugu Ultra and Claude Opus 4.8 with building a "Crossy Road" game clone using Three.js. The results underscored the operational differences between an orchestrator and a monolithic giant:
Sakana Fugu Ultra: Completed the task in 22 minutes using ~89,000 tokens for roughly $7.32. However, the final game suffered from minor logic errors, such as inverted directional turns and wonky camera angles.
Claude Opus 4.8: Took 79 minutes, burned ~940,000 tokens for nearly $37.85, and got stuck in a retry loop requiring human intervention. Despite the inefficiency, it ultimately produced superior application design and functionality.
Santos concluded the experiment by stating, "In terms of application functionality, quality, and design, Opus won. In terms of model speed and performance, Fugu... won".
Elie Bakouch, a research engineer at cloud-based, open AI infrastructure and systems provider Prime Intellect, pointed out on X that "to be clear, this is a closed source orchestrator on top of closed source models. if before you didn't control the models, now you don't even control which ones are used or how much. this is not 'AI sovereignty'..."
These early tests and reactions mirror the sentiment summarized by Reddit user GreedyWorking1499 in initial platform discussions: "Until proven otherwise, this is just a highly advanced router/wrapper, not a fundamental not a fundamental leap in intelligence like Mythos/Fable was."
Yet, as enterprises increasingly demand fail-safes against single-vendor reliance, Sakana is proving that packaging collective intelligence into a single API endpoint is a highly viable commercial path.
Related Markets
All MarketsMarket data may be delayed. Not financial advice.
💡 AI analysis provides alternative perspectives on current events