Insights
- Challenges such as memory management and efficient information retrieval must be addressed for advanced AI agents.
- Successful AI agent implementation requires breaking down a user's request into a step-by-step plan.
- AI agents use diverse reasoning models for tasks in IT operations, customer service, and content management.
- Frameworks such as MetaGPT, AutoGen, and CrewAI facilitate diverse collaboration patterns among multiple agents, allowing them to operate in parallel — with or without a designated team leader overseeing the process.
As organizations race to adopt agentic artificial intelligence (AI), successful implementation demands a comprehensive, strategic approach. Rather than focusing solely on the technology's capabilities, businesses must also understand how AI agents develop and execute those capabilities effectively.
AI agents employ diverse reasoning models to accomplish tasks across various domains, including IT operations, customer service, and content management. These approaches range from a single agent accessing multiple information sources via APIs to multi-agent systems collaborating in hierarchical or conversational structures.
However, these approaches come with inherent challenges, including memory management, efficient information retrieval, and defining execution limits to ensure timely responses. Despite these obstacles, they serve as steppingstones toward the next generation of AI agents, which will further enhance reasoning capabilities and deliver more accurate, context-aware outputs.
For agents to successfully automate a process or workflow, they must be able to break down a user's request into a clear, step-by-step plan. For example, solving an algebraic equation requires an agent to follow these five steps:
- Identify the variable that needs to be solved.
- Simplify the expression.
- Isolate the variable by adding or subtracting terms, multiplying or dividing coefficient to isolate the variable].
- Check the answer by substituting the calculated value of the variable.
- Present the final answer.
This is a simple example of a plan or workflow that an agent or a large language model (LLM) needs to come up with to solve a problem and then execute each step using an appropriate tool to come up with the final answer.
In business scenarios, the plans could be how to resolve a customer ticket by predicting the resolution, planning the steps to execute like calling enterprise API and summarizing the outcomes as a final answer, Challenges lie when agents and LLMs need to understand enterprise process or product specific nuances, policies and tools. Due to these, specific agentic patterns — be it a single agent, multi-agent, external aided, or fully autonomous planning agents — are required. Below are some common agentic patterns.
Single agent with task decomposition
These were made popular with the paper “ReAct: Synergizing Reasoning and Acting in Language Models,” and are based on task decomposition, or breaking down the task into a series of subtasks. This is done by using LLMs’ ability to generate a reasoning chain and act, or calling on or more tools, and finally processing the outcome of the tool to accomplish the task.
Let’s take an example in which a user asks about the weather in New York for the next four days. An LLM is given a prompt where it is asked to think, to determine a tool — such as an API or code, wait for tool output, and then present its observation. The typical prompt looks like this:
You run in a loop of thought interleaving thought, action, pause, observations steps. Use thoughts to describe your thoughts about the question you have been asked Use tool to run one of the tools available to you, then pause.
Make sure you perform one step at a time and wait for next step to be called.
Your available tools are:
Weather tool to perform weather prediction for expressions like weather for next four days, weather on a given day of the week, or date or month.
Weather tool accepts inputs like 1,3,4,5 up to 7 days, a valid date in month/day/year format and a city code like LN and country code like UK.
Examples:
Question: What is the weather for next three days in London?
Thought: I should look up weather API to find the weather
Tool: weather: inputs LN, UK, 3d:
Pause
Observation:
In the example prompt above, suggesting the tools and giving examples — such as few-shot learning — provides a guide to an agent to perform its tasks. Agents can use more than one tool by calling them in a sequential manner. For example, if a user wants to plan a holiday based on weather, an LLM can be supplied with a tool such as Transport for London’s API to plan a mode of transport to reach a desired location.
Developers can use frameworks like LangChain, and LlamaIndex to implement react agent-based solutions quickly while supplying their preferred choice of an LLM.
Agents with external planner
Engineers in ITOps need detailed and specific knowledge to find a root cause for a given problem or incident, and they will execute several steps to resolve an incident. To come up with a plan to find a root cause of an issue, an agent needs to understand an incident, query appropriate monitoring systems, understand logs and metrics associated with the application in question, and understand and query observability systems like user experience monitoring, including tools such as Dynatrace and Newrelic, in order to arrive at various data points and make an appropriate suggestions on root cause analysis.
The challenge here is with LLMs like Open AI or Claude, which lack understanding and workflow to resolve a specific task for a given enterprise to determine an appropriate flow.
To manage these variances in sources of information, and the sequential tasks an agent needs to perform, it might not be prudent to manage these tasks with one single prompt as it would be difficult to introduce new plans based on debugging information. In such cases based on complexity and scale of the problem various patterns can be introduced:
Static- or graph-based agents
These are semi-autonomous agents that take dynamic decisions. As an example, when an invoice order tracking query is sent by the customer, after comprehending the email, based on the order number, product type and status of the order, the ticket is assigned to appropriate team, who then responds to the customer or the contact center agent.
Here, an autonomous agent has to understand the ask from the email, use a tool to retrieve the order details, based on the product, inventory and shipping status, then make an appropriate decision to either inform the product inventory manager or send an email to the logistics company to check the on status of the order to get back to the customer.
For problems that require concrete steps and LLM-based decisioning (understanding emails, sending or assigning tickets to either the product manager or the logistics company in above example), or one that has to implement specific workflow patterns, graph-based agents really shine.
A graph structure where a node can be defined as an agent or task, and edges that define the relationship or how these agents can depend on each other, helps with parallel execution and reallocation of the tasks. In the above example, the inventory agent, shipment agent and customer support agent can orchestrate when for example the inventory agent comes back with an ETA which is longer than usual, the customer service agent can then either advise the customer on the delay or escalate to a human supervisor.
A complex retrieval-augmented generation (RAG) system, which augments LLMs by retrieving relevant knowledge, and that rewrites queries based on the history of the conversation, or dynamically routes to the various tools and agents, is where this graph-based patterns approach really shines. Additionally, self-RAG patterns, where LLMs are required to critique their own output in order to refine the answer, are some examples where complex yet well-defined patterns can also be implemented with graph-based agents.
Dynamic external plans
This is a simple set of external instructions in form of prompts or structured input that provides a plan to guide various sequences of steps based on the task at hand where the LLM can choose a specific prompt or planning sequence based on the problem statement.
To generate a plan, agents can use a static prompt — such as one without variables — that can either be configured using an external source like a database or a prompt DB. Using intelligent search, the LLM can then retrieve an appropriate plan. The external source could also be a configuration management database, where system dependencies and their observability platform, which monitors the health of the whole environment, can be queried and used to create a troubleshooting plan for an agent.
Multi-agent or collaborative agents
Several frameworks, including MetaGPT, AutoGen, and CrewAI, enable different patterns for collaborating multiple agents to perform various tasks. These can all work in parallel, with or without a team leader sitting on top of them. Although the agent can use a range of models, each has a specific, static role and can only do one thing. These patterns enable automating and optimizing either part of a process or an entire process to improve performance.
For example, a multi-agent coder proposes an approach to develop better code comprising three agents: A programmer, a test designer, and test executor agents. During the code development phase, the programmer and test designer agents can execute in parallel to develop the code and test cases. In the later phases, the programmer agent refines the code based on feedback from the test executor agents. The test designer agent can then re-generate or refine the test cases with code that has been generated and tested in previous phases.
To develop these applications, where agents need to interact with each other, following patterns are used:
Hierarchical pattern
These are applications that require a hierarchical pattern where a supervisor agent routes the inquiry to lower-level agents. In the example discussed above where a customer is querying the status of their order, in this model a customer service agent acts as a supervisor agent and directs a customer query or issue to either the product inventory manager, the logistics company or back to the customer, based on current state of the order. Here the customer service agent is responsible for breaking down the tasks: Finding the order, checking the inventory, checking with the logistics company, and forwarding these tasks to appropriate agents. Based on inputs from each agent, it makes an appropriate decision to either respond or escalate to a human manager.
The graph-based agents described above are a special case of orchestration, where there are no supervisor agents, and the entire agent interaction flow is defined using DAGs.
Conversation pattern
This approach is used for tasks which require more dynamic nature of interactions. An example of this approach is a knowledge publishing platform producing an article where the author, reviewer, copy writer, or social media influencer agents can act simultaneously along with human authors and reviewers to produce, critique and do plagiarism checks on the article.
In this pattern, individual agents are allowed to send messages to each other without having a supervisor present to direct the conversation.
Although the systems require policies to define in which order the agents must execute — for example, a research agent could produce a summary by browsing the internet — a writer agent could use resources found by the researcher agent and then produce the content. At the same time, the social media influencer agent can generate the posts for various social platforms.
A reviewer agent then carries out an overall review of the content, including social media posts which can align to the content. The reviewer agent then either provides a rewrite or goes ahead to approve and post the content.
Technical challenges in designing multi-agent architectures
Enterprises face several challenges implementing agentic architecture and platforms, including:
Memory management: Memory or conversation context among the agents needs to be managed in the form of chat history, specific facts and outcomes of the agent. Storing conversations in vector database or caches and the ability to recall this history either in present conversation or in the longer term requires a special approach that needs to be designed.
Context management: Multi-agent chats can be very long and relying on the content of these long chats can either meet with LLMs’ token size limitation or increased cost/response time due to context lengths. To manage the context, various techniques from summarization, overwriting context and memory recall from long-term memory is implemented.
Deployment: Running multi-agent applications can turn into long-running, higher order messaging processes and hence require distributed execution of agents as well as messaging capabilities that can help communicate with different agents in low latency and scalable manner.
Replay and replan based on user inputs (human in the loop): It is important for agents to restart or replan based on user interaction, where an end user might ask the agent to run the entire workflow based on new facts, or ask a specific agent execution to repeat from a specific step.
Policy management: To reduce the chatter between agents in cases where the outcome is not clear, agents can run into loops or take a long time to resolve the tasks. This in turn could result in a poor customer experience, or cost overruns due to multiple LLM calls. To tackle this, limits on agent execution time, order of orchestrations, and interdependency should be set and managed via clear and visible policies.
Advanced techniques form the building blocks of future applications
The reasoning capability of LLMs is at the core of agent frameworks. There are various research initiatives in progress to improve generation of reasoning chains, including these approaches:
Reasoning or search (“thinking”) algorithms
LLMs fundamentally generate chains using either greedy search, which tries to make the locally optimal choice at each stage, or the more complex beam search, which systematically expands the most promising nodes. These are effective algorithms for predicting the next token, but not necessarily for generating reasoning chains where every step matters for producing the final outcome.
To enhance reasoning and planning, we can use search algorithms like Monte Carlo tree search, depth-first search, or breadth-first search. These methods explore potential thought sequences (or chains of thought) and use reward models to guide the search towards better solutions. This iterative process, involving multiple steps during inference, allows for more deliberate and thorough problem-solving, often referred to as “slow thinking.”
These techniques allow for agents planning and decisioning abilities by:
- Supporting complex task decomposition for agent workflows, crucial to navigate dynamic workflows. For example, in a large code refactoring or upgrade, agents analyze the code and tool outputs to decide precisely which code sections need to be adjusted.
- Improving adaptive behavior by allowing agents to navigate dynamic environment. In an example of code refactoring, a refactoring agent would analyze the code based using code profiling tools available to it, identify code smells (cyclometric complexity or long methods) and refactor small parts of the code. It then runs the test case and based on outcome of that, either proceeds with the next code refactoring or fixes the issue that caused the test case to fail.
- Goal-oriented action by using reward models — such as allowing an agent to regenerate the code for a given requirement until compilation or test cases are run successfully.
This concept was popularized by a paper titled Scaling LLM test time compute optimally can be more effective than scaling model parameters and is one of the most promising and exciting areas of research.
Ensemble methods
This approach combines multiple models to produce enhanced results, as popularized by LLM-debate, agent forest, and mixture of agents. These methods work by increasing the number of agents and using simple methods like sampling and majority voting to determine the final reasoning output, which increases the accuracy of reasoning chains. Once the multiple outputs are generated, majority voting is used to determine the final answer.
Process reward models
These models are emerging as a promising approach for supervision of reasoning chains generated by LLMs. The reasoning chains or plans generated by LLMs are validated using process reward models, which are trained with step-level labels, unlike traditional reward models which just includes labels for final step or answer.
However, training the process reward models can be challenging for tasks with long sequences of steps such as application root cause analysis and debugging. The approach of intermediate steps using a reward model that incentivizes the correct reasoning chain presents an interesting approach toward generating accurate reasoning chains.
Advances in AI agents are being used to build tools for businesses that streamline processes and improve outcomes for users and customers, with each stage of development building on previous work. There are exciting and promising research approaches in play that will go towards building further advanced AI agents that are likely to deliver even more value.