Want smart insight into your inbox? Enterprise AI, only what matters to data and security leaders, sign up for our weekly newspapers. Subscribe now
Japanese AI Lab Sakan ai A new technique is introduced that allows many large language models (LLM) to cooperate in the same task, effectively forms the “Dream Team” of AI agents. Method, called Multi-LM AB-MCTSThe model enables testing-and-turns and combines their unique strength to solve problems that are very complex for any individual model.
For enterprises, this approach provides a means to develop a more strong and capable AI system. Instead of closing in a single provider or model, business can dynamically take advantage of the best aspects of various frontier models, assigning the right AI for the right part of a task to achieve better results.
Power of collective intelligence
Frontier AI models are growing rapidly. However, each model has its own specific forces and weaknesses that are derived from its unique training data and architecture. One can excel in coding, while another attains excellence in creative writing. Researchers at Sakana AI argue that these differences are not bugs, but a feature.
Researchers said, “We see these prejudices and various abilities not as limitations, but as precious resources to create collective intelligence.” blog postThey believe that the way the biggest achievements of humanity come from diverse teams, AI systems can also get more by working together. “By pooling your intellect, the AI system can solve problems that are inaccessible to any single model.”
Prolong
The new algorithm of Sakana AI is a “Inflation-Time Scaling” technique (also known as “test-time scaling”), a field of research that has become very popular in the previous year. While most of the attention in AI has been on the “training-time scaling” (making the model big and trains them on a large dataset), after a model already trained after a model, allocation of more computational resources improves and improves estimated scaling performance.
A common approach involves using reinforcement learning so that the model can be indicated to generate more wide range-to-thought (COT) sequence for a long time, as seen in popular models such as Openai O3 and Deepsek-R1. Another, the simple method is repeated sample, where the model is given the same indication several times to generate a variety of potential solutions similar to a brainstorm session. The work of Sakana AI connects these ideas and carries forward.
“Our framework offers a more strategic version of the best-off-n (aka sampling),” venturebeat said. “It reduces techniques like long cottage through RL. By selecting dynamically search strategy and appropriate LLM, this approach maximizes performance within a limited number of LLM calls, providing better results on complex tasks.”
How adaptive branch search works
The origin of the new method is an algorithm called Adaptive Branching Monte Carlo Tree Search (AB-MCTS). This enables an LLM to effectively balance two different search strategies and enable testing-and-turuti effectively: “deep search” and “search broader.” In -depth discovery involves taking a promising answer and refining it frequently, while making a thorough new solutions from the scratch while making a comprehensive search. AB-MCTS combines these approaches, allows the system to improve a good idea, but also tries to pive and do something new if it hits a dead end or reveals another promising direction.
To complete it, the system uses Monte Carlo Tree Search (MCTS), a decision -making algorithm used by the alfago of Deepmind. In each stage, AB-MCT uses probability models to decide whether it is more strategic to refine the existing solution or generate a new.

Researchers took it a step forward with multi-LLM AB-MCT, which not only to “what to” generate refine vs.), but also “LLM”. At the beginning of a task, the system does not know which model is best suited for the problem. It begins by trying a balanced mixture of available LLM and, as it moves forward, learns which models are more effective, allocating them over time over time.
Putting AI ‘Dream Team’ in Test
Researchers tested their multi-LLM AB-MCTS system Arc -AGI -2 benchmarkThe ARC (abstract and argument corpus) is designed to test a human -like ability to solve novel visual logic problems, making it very difficult for AI.
The team used a combination of Frontier models including O4-Mune, Gemini 2.5 Pro and Dipsek-R1.
The collective of the model was able to find the right solution for more than 30% of the 120 testing problems, a score that made any model working alone. The system demonstrated the ability to assign the best model dynamically for a given problem. On the tasks where a clear passage for a solution was present, the algorithm quickly identified the most effective LLM and used it more often.

More effectively, the team observed examples where the model solved problems that were already impossible to any one. In one case, a solution generated by the O4-Mini model was incorrect. However, the system passed this flawed effort to Deepsek -R1 and Gemini -2.5 Pro, which were able to analyze error, fix it and eventually produce the correct answer.
“It indicates that multi-LLM AB-MCT can combine the frontier model to solve the pre-resolved problems in a flexible manner, pushing the boundaries of using LLMS, using LLMS as a collective intelligence, pushing the boundaries,” researchers write.

“In addition to the individual professionals and opposition of each model, the trend of hallucinations may vary between them,” Akiba said. “By creating a attire with a model, which is less likely to be the best, it may be possible to achieve the best of the world: powerful logical abilities and strong groundness. Since hallucinations are a major issue in a professional context, this approach can be valuable to its mitigation.”
From research to real world applications
To help developers and businesses implement this technology, Sakana AI has released the underlying algorithm as an open-source framework TricuvestApache 2.0 is available under license (usable for commercial purposes). Tricuvst provides a flexible API, allowing users to apply multi-LLM AB-MCT for their own functions with custom scoring and logic.
Akiba said, “While we are in the early stages of implementing AB-MCT for specific business-oriented problems, our research reveals significant abilities in many fields,” Akiba said.
Beyond ARC-AGI-2 benchmark, Team Complex was able to successfully implement AB-MCT in tasks such as algorithm coding and improve the accuracy of machine learning models.
Akiba said, “AB-MCT can also be highly effective for problems that require recurrence testing-and-trunk, such as adaptation of existing software performance matrix,” Akiba said. “For example, it can be used automatically to find ways to improve a web service response to delay.”
The release of a practical, open-source tool can pave the way for a new class of more powerful and reliable enterprise AI applications.