Under the Hood of AI Agents: A Technical Guide to the Next Frontier of General AI

Agents are the hottest topic in AI today, and with good reason. AI agents act on behalf of their users, autonomously handling tasks like making online purchases, building software, researching business trends, or booking travel. By taking generative AI out of the sandbox of a chat interface and allowing it to act directly on the world, agent AI represents a leap in the power and usefulness of AI. Zen represents a leap forward in the power and usefulness of AI, by taking AI out of the protected sandbox of the chat interface and allowing it to act directly on the world.

Agent AI is advancing really fast: for example, one of the main building blocks of today’s agents, the Model Reference Protocol (MCP), is only a year old! Like any fast-moving field, there are many competing definitions, heated notions, and confusing opinions.

To cut through the noise, I’d like to describe the main components of an agentic AI system and how they fit together: It’s actually not as complicated as it may seem. Hopefully, after you’ve read this post, agents won’t seem so mysterious.

agentic ecosystem

There are lots of definitions of the term “agent”, but I like a slightly different take on British programmer Simon Willison’s minimalist approach:

An LLM agent runs the tool in a loop to achieve a goal.,

The user prompts a large language model (LLM) with a goal: say, to book a table at a restaurant near a specific theater. With the goal in mind, the model receives a list of tools at its disposal, such as a database of restaurant locations or a record of the user’s food preferences. The model then plans how to achieve the goal and calls a tool, which provides feedback; Then the model calls a new tool. Through repetition, the agent moves toward accomplishing the goal. In some cases, the orchestration and deployment options of the model are complemented or enhanced by mandatory code.

But what kind of infrastructure is needed to realize this vision? An agentic system requires some core components:

a way to create agentWhen you deploy an agent, you don’t want to code it from scratch. There are many agent development frameworks.
to somewhere Run AI models. An experienced AI developer can download the open-source LLM, but it requires expertise to do it properly. It also requires expensive hardware that would be of poor use to the average user.
to somewhere run agent codeWith the framework established, the user creates code for an agent object with a defined set of functions. Most of those tasks involve sending signals to an AI model, but the code needs to run somewhere. In practice, most agents will run in the cloud, because we want them to keep running when our laptops are off, and we want them to keep going to do their work.
A mechanism to translate between text-based LLM and tool call,
A short term Memory To monitor the content of agentic interactions.
A long term memory To keep track of user preferences and affinities across different sessions.
a way to trace System execution to evaluate agent performance.

Let’s consider each of these components in more detail.

creating an agent

Asking an LLM to explain how he plans to perform a particular task improves his performance on that task. This “chain-of-thought logic” is now ubiquitous in AI.

Agentic systems have the analog react (argument + action) model, in which the agent has a thought (“I will use the map function to locate nearby restaurants”), takes an action (issues an API call to the map function), then makes an observation (“There are two pizza places and an Indian restaurant within two blocks of a movie theater”).

React isn’t the only way to build agents, but it’s at the core of most successful agentic systems. Today, agents generally stay in the loop thought-action-observation Sequence.

Devices available to the agent may include local devices and remote devices such as databases, microservices, and software as a service. The specification of a tool includes a natural-language explanation of how and when it is used and the syntax of its API calls.

The developer can also ask the agent to essentially create its own tools on the fly. Let’s say a tool retrieves a table stored as comma-separated text, and to accomplish its goal, the agent needs to sort the table.

Sorting a table by repeatedly sending it through an LLM and evaluating the results would be a huge waste of resources – and it’s not even guaranteed to produce correct results. Instead, the developer can easily instruct the agent to generate its own Python code when it encounters a simple but repetitive task. These snippets of code can run locally with the agent or in a dedicated secure code interpreter tool.

The tools available can divide the responsibility between the LLM and the developer. Once the tools available to the agent are specified, the developer can simply instruct the agent which tools to use when necessary. Or, the developer can specify which tools to use for which types of data, and even which data items to use as arguments during function calls.

Similarly, the developer can ask the agent to generate Python code only when necessary to automate repetitive tasks or, alternatively, can tell it which algorithms to use for which data types and even provide pseudocode. The approach may vary from agent to agent.

Order

Historically, there were two main ways to isolate code running on shared servers: containerization, which was efficient but provided less security; and virtual machines, which were secure but came with very high computational overhead.

In 2018, the Lambda serverless-computing service of Amazon Web Services (AWS) was deployed firecrackersA new paradigm in server isolation. Firecracker creates “microVMs”, complete with hardware isolation and its own Linux kernel, but with low overhead (as little as a few megabytes) and startup time (as little as a few milliseconds). Low overhead means that each function executed on a Lambda server can have its own microVM.

However, because instantiating an agent requires deploying an LLM with memory resources to track the inputs and outputs of the LLM, the per-function isolation model is impractical. Instead, with session-based isolation, each session is assigned its own microVM. When the session ends, the state information of the LLM is copied to long-term memory, and the microVM is destroyed. This ensures secure and efficient deployment of a group of agents.

tool call

Just as there are many existing development frameworks for agent creation, there are many existing standards for communication between agents and devices, the most popular of which – currently – is the Model Reference Protocol (MCP).

MCP establishes a one-to-one connection between the agent’s LLM and a dedicated MCP server that executes tool calls, and it also establishes a standard format for sending various types of data back and forth between the LLM and its servers.

Many platforms use MCP by default, but are also configurable, so they will support a growing set of protocols over time.

However, sometimes the required tool does not have an available API. In such cases, the only way to retrieve data or take any action is cursor movement and clicking on a website. There are many services available to do this computer useThis makes any website a potential tool for agents, opening up decades of content and valuable services that are not yet available directly through APIs.

authority

With agents, authority works in two directions. First, of course, users need authorization to run the agents they create. But since the agent is acting on behalf of the user, it will usually need its own authorization to access network resources.

There are a few different ways to deal with the authorization problem. One is with access delegation algorithms such as OAuth, which essentially carry out the authorization process through an agentic system. The user enters login credentials into OAuth, and the agentic system uses OAuth to log in to the protected resources, but the agentic system never has direct access to the user’s password.

In the second approach, the user logs in to a secure session on the server, and the server has its own login credentials on the protected resources. Permissions allow the user to select from different authorization strategies and algorithms to implement those strategies.

memory and trace

short term memory

LLM is the next word prediction engine. What makes them surprisingly versatile is that their predictions are based on long sequences of words they have already seen, known as. ContextContext is a type of memory in itself. But this is not the only type that an agentic system needs.

Suppose, again, an agent is trying to book a restaurant near a movie theater, and from a map tool, he or she has retrieved a few dozen restaurants within a one-mile radius. It doesn’t make sense to dump information about all those restaurants into the LLM context: all that extraneous information could wreak havoc with your prospects for the next term.

Instead, it may store the entire list in short-term memory and retrieve one or two records at a time depending on the user’s price and cuisine preferences and proximity to the theater. If none of those restaurants work, the agent can dip back into short-term memory, rather than executing another tool call.

long term memory

Agents also need to remember their prior interactions with their customers. If last week I told the restaurant booking agent what kind of food I like, I don’t want to tell it again this week. The same applies to my price tolerance, the kind of environment I’m looking for, etc.

Long-term memory allows the agent to see what it needs to know about prior conversations with the user. However, agents do not usually form long-term memories themselves. Instead, after a session is complete, the entire conversation goes over to a separate AI model, which creates new long-term memories or updates existing ones.

Memory construction may include LLM summarization and “chunking”, in which documents are divided into sections grouped by topic for ease of retrieval during later sessions. Available systems allow the user to select strategies and algorithms for summarization, segmentation, and other information-extraction techniques.

observability

Agents are a new kind of software system, and they require new ways to think about observing, monitoring, and auditing their behavior. Some of the questions we ask will sound familiar: are agents running fast enough, how much are they costing, how many tool calls are they making, and are users happy. But new questions will also arise, and we cannot predict what data we will need to answer them.

Observation and tracing tools can provide an end-to-end view of the execution of a session with an agent, detailing step-by-step which actions were taken and why. For the agent builder, these traces are key to understanding how well agents are working – and providing data to make them work better.

I hope this explanation has made agentic AI clear enough that you’re willing to try creating your own agents!

What's Hot

Behind the scenes of drone food delivery in Finland

The most durable USB-C cable I’ve tested so far is only $11 this weekend (and I’ll be buying several)

Finally, an Android tablet that I wouldn’t mind keeping my iPad Pro for (especially at this price)

3 ways AI agents will transform your work beyond recognition in the next few years

Big Four PwC is unable to find the right technical talent to hire

Their General Alpha business reached $100 million last year: Evereden, Sephora

Microsoft’s new text editor is a VIM and Nano option

The best luxury car for buyers for the first time in 2025

Massives Datenleck in Cloud-Spichenn | CSO online

Most Popular

10,000 steps or Japanese walk? We ask experts if you should walk ahead or fast

FIFA Club World Cup Soccer: Stream Palmirus vs. Porto lives from anywhere

What do chatbott is careful about punctuation? I tested it with chat, Gemini and Cloud

Our Picks