Deepmind table tennis robots train each other

Rarely a day goes without an influential new robot platform emerging from academic laboratories and commercial startups worldwide. Humanoid robots look particularly capable of helping us in factories and eventually in homes and hospitals. Nevertheless, to be really useful for these machines, they require sophisticated ‘brain’ to control their robot body. Traditionally, experts spending countless hours to achieve the desired performance in the programming robot include scripting experts such as scripting parameters such as controlling parameters such as controlling parameters or motion planning weight. While machine learning (ML) is a promise of techniques, robots that need to learn new complex behaviors still require adequate human monitoring and re -engineering. But Google Deepmind,

This question has been a motivational power behind our robotics research. We are searching for paradigms, where two robotic agents playing against each other can obtain a large degree of autonomous self-reform, walking beyond the system that can learn a wide range of skills at work for agents who are pre-ordered with only certain or narrowly adapted ML models. Building our previous work in ML with systems like Alfago and Alffold, we diverted to the game seeking our attention Table tennis as a test,

We chose table tennis properly because it faces a constrained, yet highly dynamic, the most difficult challenges in robotics within the environment. Table tennis requires a robot to master a confluence of hard skills: just beyond perception, it demands exceptionally accurate control to intercepted the ball on the right angle and velocity, and is involved in strategic decision making to exclude an opponent. These elements make it an ideal domain for the development and evaluation of strong Learning algorithms that can handle the need for real -time interaction, complex physics, high level arguments and adaptive strategies,Capabilities that are directly manufacturing and evenly transferred to applications such as manufacturing and potentially unarmed home settings.

Self-reforming challenge

Standard machine learning approaches often decrease when it comes to enableing continuous, autonomous learning. Learning copy, where a robot learns by mimicking a specialist, usually we need to provide a huge number of human performances for every skill or variation; This dependence on expert data collection becomes an important bottleneck if we want robots to learn new tasks continuously or refine its performance over time. Similarly, learning reinforcement, which trains agents through testing-and-trichids directed by awards or punishments, is often necessary that human designers carefully adapted to hold the desired behavior desired for multilateral functions, and then the robots were adapted as the need to improve new skills. In short, both these well-established methods include traditionally adequate human participation, especially if the target robots have to continuously self-improvement beyond their initial programming. Therefore, we gave a direct challenge for our team: Can we learn and increase your skills with minimal or no human intervention during robots learning and improvement loops?

Learning through competition: robot vs robot

An innovative approach that we discovered the strategy used for Alfago: have agents learned by competing against themselves. We used two robot arms play table tennis against each other, A The idea that is simple yet powerful: as a robot reveals a better strategy, its opponent is forced to adapt and improve, making a cycle to increase the level of skill.

https://www.youtube.com/watch?v=b9_oytzkwv8 Lampmind

To enable the extensive training necessary for these paradigms, we engineered a completely autonomous table tennis environment. This setup allowed for continuous operation, which features remote monitoring and control along with automated ball collections, allowing us to run for an extended period without direct involvement. As the first step, we successfully trained a robot agent (independently repeated on both robots) using reinforcement learning in simulation to play cooperative rallies. We corrected the agent for a few hours in the real-world robot-bamn-robot setup, resulting in a policy capable of holding long rallies. We then switched to deal with competitive robot-bamn-robot play.

Outside the box, the cooperative agent did not work well in competitive sports. It was expected, because in cooperative sports, the rallies would settle in a narrow area, the agent can hit back, limiting the distribution of balls. Our hypothesis was that if we continued training with a competitive game, this distribution would gradually expand as we rewarded each robot to defeat our opponent. Promising, the training systems through competitive self-play in the real world presented significant obstacles-increased in large prescription and became rigid given in view of limited model size obstacles. Essentially, it was difficult for the model to effectively learn to deal with new shots without forgetting the old shots, and we quickly hit a local Minima in training, where after a small rally, a robot would kill an easy winner, and the other robot was not able to return it.

While the robot-on-robot has become a hard walnut to crack the competitive game, our team also investigated How to play competitively against humansIn the early stages of training, humans did a better job of keeping the ball in the game, thus enhancing the distribution of shots that could learn from the robot. We still had to develop a policy architecture, with their detailed skills with a high level controller with a high level controller, which selects low level skills, as well as techniques to enable zero-shot sim-to-re-ritual approaches to allow to adapt to unseen opponents in real time as well as techniques. In a user study, while the robot lost all their matches against the most advanced players, it won all its matches against beginners and against intermediate players in about half of their matches, in which concrete Shaukiya was demonstrating human-level performance. Equipped with these innovations, as well as a better starting point compared to cooperative sports, we are in a spectacular position to return to robot-bamn-robot competitive training and continue scaling rapidly.

https://www.youtube.com/watch?v=eqqqqqQL- JQXTOELampmind

AI Coach: VLM enters the game

Another complicated idea we have investigated, like Gemini, takes advantage of the power of the Vision Language Model (VLM). Can a VLM act as a coach, can observe a robot player and provide guidance for improvement?

Lampmind

An important insight to this project is that VLM can be benefited Explainable Robot policy search. Based on this insight, we developed SAS Prompt Our approach can be considered an early example of a new family of clear policy search methods that are fully applied within an LLM. In addition, there is no reward function – VLM affects the reward directly from the observations given to the work details. VLM can thus become a coach that continuously analyzes the student’s performance and provides suggestions to get better.

Lampmind

Really learned to robotics: an optimistic approach

It is necessary to move beyond the boundaries of traditional programming and ML techniques for the future of robotics. Ways to enable autonomous self-reforms, such as we are developing, reduces dependence on labor efforts. Our table tennis projects detect the route towards the robot that can achieve and refine complex skills more autonomously. While important challenges persist-stabilizing robot-bron-robot learning and scaling VLM-based coaching are formidable tasks-these approaches provide a unique opportunity. We are optimistic that the ongoing research in this direction will produce more capable, adaptable machines that can learn the diverse skills required to operate effective and safely in our unnecessary world. The journey is complex, but the potential payment of really intelligent and supportive robotic partners makes it worth moving.

Writer Google Deepmind Robotics Team and especially David B. D. Ambrosyo, Saminda Abiruwan, Laura Greeser, Eti ISSEN, Alex Bowle and Crysta Ramon express their deep praise for their invaluable contribution to the development and refinement of this work.

From your site articles

What's Hot

I tried 0patch as a last resort for my Windows 10 PC – here’s how it compares to its promises

A PC Expert Explains Why Don’t Use Your Router’s USB Port When These Options Are Present

New ‘Remote Labor Index’ shows AI fails 97% of the time in freelancer tasks

CES 2026: Follow live for the best, weirdest and most interesting tech as physical AI and robots dominate the event

CES 2026: Follow live with news from Caterpillar and Nvidia, as well as robotaxis, robots, and surprises from the show floor

CES 2026: Follow the first official day live with Ring, Mobileye, Siemens, robots, AI and more

Microsoft’s new text editor is a VIM and Nano option

The best luxury car for buyers for the first time in 2025

Massives Datenleck in Cloud-Spichenn | CSO online

Most Popular

Google tests AI-operated audio overview in search results for some questions

Yes, this was the original voice of the Garat in the trailer for the thief VR

A close look at the moons of Uranus reveals a stunning dark side

Our Picks