The Rise of AI Agents: Automating Complex Tasks with LLMs
AI Agents are combination of software applications, of which one or more are LLMs, which will together achieve a complex task on behalf of user. As we know, most LLMs are excellent language processors which are not only capable of understanding human language but produce results which can be easily understood by human beings. There are traditional software systems which can execute tasks like booking a ticket after checking availability etc. AI agents are attempting to combine both LLMs and traditional software tools like APIs, Sensors, IoTs and to achieve a result like managing a complete vacation tour etc.
Let us take the example of a tour planner for further elucidation.
For planning a tour, traditionally we
- Search for tourist attractions of a place of our interest
- Search for most feasible mode of transportation to that place
- Search for availability of tickets in preferred mode of transportation like flight, train, bus etc.
- Search for availability of suitable accommodation within our budget at that place
- Search for transportation available within that place
- Search for booking entry tickets etc. to various tourist attractions
- Search for restaurants available at that place
- Sometimes, climate conditions at that place also need be checked
- Check for any possible events / local holidays of the place to avoid heavy rush
- In each stage, we will have to ensure that the budget is not overshoot
- Then, we start booking each of the services like transportation, accommodation, local transportation, restaurant booking etc.
- Then we proceed to make advance payments
- Then, a detailed itinerary need be prepared
All these steps will take a long time. If we need to do such planning frequently for business tours, time wastage will be more, even though some of the above steps can be omitted for a business tour.
What if a software system is in place which can take the requirement in few sentences and take many decisions as human beings do, make necessary searches by taking into consideration the preferences and budget considerations, do bookings, payments & itinerary preparation on its own, that would be wonderful. Isn’t it? This is what AI agents are trying to achieve; in fact, achieved to a high extent.
An LLM cannot achieve this on its own because the information with which it is trained is anyway a few days old; it cannot get real-time information such as availability of a service etc. There arises the need for combining LLMs with traditional systems. So, AI agents are combining the best of both worlds.
Let us explore some features of an AI agent.
Reasoning: AI agent should be able to understand the exact intent of the user and the context in which the user has sought agent’s help. In the case of tour planning, they should understand the difference between a casual enquiry of tourist attractions at a place and the genuine intention to plan it fully. LLMs can achieve this to a very large extent.
Acting: AI agent should do actions such as search, block services, make payments and prepare a complete report of the services which the user can enjoy. This is different from the compilation of some information into a nice readable format as done by LLMs.
Observing: AI agent listens to many real-time data such as weather data, traffic data and similar and takes an informed decision on behalf of user.
Planning: AI agent should plan for the achievement of the result. In this, agents need to take many decisions based on constraints put forward by the user. For example, it should apply appropriate budget restrictions for per-night rates while searching for accommodation availability. If such rooms are not available, it should enhance the budget/change location of accommodation appropriately and then try to adjust rates in local transportation. Most of these activities should be done autonomously without asking many subsequent questions to user.
Interacting: AI agent should not do all the activities independently. Just because a kid told it to book a holiday at Switzerland, it should not go and debit a huge amount to user’s credit card and mail an itinerary to the user. It should appropriately involve human beings before finalizing actions. At the same time, it should not ask many nitty-gritty questions like choosing between these 3 hotels etc.
Learning / Self Refining: AI agent should not only remember card numbers & credentials etc. of the user but also the preferences of the user. As more & more tour plannings are done, it should remember more and make the next tour planning further easier.
AI Agents, AI Assistants & Bots
Bots can do simple tasks – mostly driven by pre-defined data and solve a recurring problem.
AI Assistants can do more complex tasks than bots; however, their decision-making capabilities are much limited. They need frequent human interaction to achieve a task. Their learning capabilities are limited. Interaction with third-party tools, sensors etc. are also limited.
AI Agents are better of these 3 and work autonomously, executes complex workflow, interacts with human beings only for a final review and confirmation. AI Assistants & Bots are mostly reactive; AI Agents are pro-active, goal oriented and learn from their actions & mistakes. They heavily interact with third-party APIs, sensors, IoT systems etc. AI agent can Understand, Think & Act much better than the other two.
The Technology
Now, let us see some systems with which we can write such AI agents.
AutoGen from Microsoft is an open-source framework for building AI agents. Crew AI is another one. Crew AI is a multi-agent platform means, they consist of multiple agents which can interact & can achieve more accurate results. LangChain is a very popular platform for building AI agents.
PhiData is a Python package which can be used to programmatically build an AI agent.
N8n is a framework for workflow automation.
In all these frameworks, we combine LLMs with many tools such as YFinance from Yahoo, APIs like HotelBeds, Weather Sensors, Google traffic data etc. Please note that the choice of LLMs, choice of Tools etc. affect the accuracy & usability of the agents very much. Not all the LLMs are capable of serving all types of agents. For example, at present, Open AI or Llama may be more suited for tour planning whereas Claude may be suitable for a website generation agent.
The Future
As per the vision of Mark Zuckerberg, within few years, there is going to be more number of AI agents than the number of people of the world. It is certain that such agents are going to enter into the lives of everyone. The knowledge to develop & constantly refine such agents is of supreme importance to individual software professionals as well as software companies.
We can help!