Join our daily and weekly newspapers for exclusive content on the latest updates and industry-composure AI coverage. learn more
David Silver and Richard Sutton, two famous AI scientists, argue in one New paper This artificial intelligence is about to enter a new phase, “the era of experience.” This is the place where the AI systems rely on the data made rapidly on the data made and make themselves better by collecting data from the world and interacting with the world.
While the paper is ideological and visible at the front, it has direct implications for enterprises that want to build and build with future AI agents and systems.
Both silver and sutton are experienced scientists with track records of creating accurate predictions about the future of AI. The predictions of validity can be seen directly in today’s most advanced AI system. In 2019, Sutton leading reinforcement, Sutton wrote the famous essay “Bitter lesson“In which he argues that the biggest long-term progress in AI arises from taking advantage of large-scale calculations with constant general-purpose discovery and learning methods, instead of rely on incorporating the incorporated, human-personal domain knowledge rather than incorporated.
David, a senior scientist at Deepmind, was an important contributor for David Silver, Alfago, Alphazero and Alphastar, who is all important achievements in learning deep reinforcement. He was also a co-writer of a paper in 2021, claiming that it would be sufficient to learn reinforcement and create a well-designed reward signal very advanced AI system.
The most advanced large language models (LLMS) take advantage of those two concepts. The new LLM wave that conquers the AI scene since GPT-3 has mainly rely on calculations calculation and data to internal knowledge. The most recent wave of logic models like Depsek-R1 has shown that reinforcement is enough to learn and learn a simple reward signal Complex logic skills.
What is the era of experience?
The “era of experience” creates the same concepts that Sutton and Silver are discussing in recent years, and consider them to be for recent advances in AI. Authors argue that “the pace of fully powered progress from learning supervision from human data is slowing down the performance, indicating the need for a new approach.”
And the approach requires a new source of data that must be generated in a way that the agent improves continuously. “It can be obtained by allowing agents to continuously learn from their own experience, that is, the data that is generated by the agent interacting with its environment,” Sutton and Silver Right. He argues that finally, “Experience will become a major medium of improvement and will eventually dwarf the scale of human data used in today’s system.”
According to authors, in addition to learning from their own experienced data, the future AI system will break in four dimensions “through the boundaries of human-focused AI systems”:
- Sections: Instead of working in the disconnected episodes, AI agents will have “their own stream of experience, which for a long time, progresses like humans.” This will allow agents to plan long -term goals and adapt to new behavior patterns over time. We can see its glimmers in the AI system, which have very long reference windows and memory architecture that continuously update user interactions.
- Work and observation: Instead of focusing on human-conspected tasks and comments, agents in the era of experience will act autonomously in the real world. Examples of this are agentic systems that can interact with external applications and resources through devices such as computers’ use and model reference protocol (MCP).
- award: Current reinforcement system is mostly dependent on human-designed reward functions. In the future, AI agents should be able to design their own dynamic reward functions that optimize over time and match user preferences with real -world signs collected from agent’s actions and comments in the world. We are looking at the initial version of the self-designing awards with a system such as Nvidia’s Drakeka.
- Plan and logic: The current logic model is designed to mimic the human idea process. The authors argue that “the more efficient mechanisms of the idea are definitely present, using non-human languages, which, for example, can use symbolic, distributed, continuous, or different calculations.” AI agents should connect with the world, inspect and use data to validate and update their logic process and develop a world model.
The idea of AI agents that adapt themselves to their environment through learning reinforcement. But first, these agents were limited to very constrained environment such as board games. Today, agents who can interact with complex environment (eg, use of AI computer) and will cross these boundaries in advance in learning reinforcement will bring about transition in the era of experience.
What does this mean for the enterprise?
Buried in sutton and silver paper is an observation, which will be important implications for real-world applications: “Agent user can use interfaces such as ‘human-friendly’ functions and comments, which can naturally facilitate communication and collaboration with the user. The agent can naturally facilitate the user. The agent can execute the code and call the APIS to call the code and call APIS,
The era of experience means that developers have to create their applications not only for humans but also AI agents. Machine -friendly tasks require the manufacture of safe and accessible APIs that can be easily accessed directly or through interfaces like MCP. This also means making agents that can be made searchable through protocols like Google’s agent2agent. You also have to design your API and agent interface to provide access to both tasks and comments. This will gradually enable agents to learn and learn from their interactions with your applications.
If Sutton and Silver Present which becomes vision reality, will soon be billions of agents roam around the web (and in the physical world) to complete the tasks. Their behavior and requirements will be very different from human users and developers, and being an agent-friendly way to interact with your application will improve your ability to take advantage of the future AI system (and also prevent losses that they can cause).
“By building on the foundation of RL and adopting our main principles for the challenges of this new era, we can unlock the entire ability to learn autonomous learning and really pave the way for supernatural intelligence information,” Sutton and Silver Right.
Deepmind refused to provide additional comments for the story.