Pushing Agentic AI to the Limit.
Building Presentation Agent using OpenAI’s Swarm method.
Moving past the introduction and getting right to the point, I want to explain in this short article how you can build quasi-agentic LLM applications using basic tool calling and then compare it to OpenAI’s Swarm method*.
Tool calling is absolutely fine. The structure of your code is fairly simple. User message comes in, you attach system message (each time). You define a set of tools which you send to API with the messages.
Now here is the important bit, as API starts streaming the chunks of answer back at you, they can now include tool calls which your code uses to invoke tools (e.g. web search to get more info) as the tool returns, you continue looping and don’t break out of the loop. You let the LLM decide when to stop.
This actually works very well. If you give LLM longer task or a chain of tasks, depending on how well you define tools and how well your code performs their execution, it will continue looping thinking about the results and explaining to your user what it is doing. (But make sure to spend enough effort on messages token management -> create a token cutter that trims messages on each additional API run while retaining key information.)
The downside of this approach is its linear nature. With only one system message attached to each run, complex instructions about when and how to use multiple tools in sequence will, I guarantee you, confuse the LLM and yield unsatisfactory results.
OpenAI’s Swarm is a distribution of this tool calling across several agents. Each agent has her own system message — instructions that it is reminded of on each API run. Agents can call tools. Agents can hand over to other agents.
That’s it.
Typically, there is one orchestrating agent that divides tasks among other agents, who can delegate further or call tools. It is upon you to play around with the system messages and delegation rights until you get to a point where they work in a good harmony. Of course, the more complex is the overall goal, the harder it is to make this work.
To be clear, if you define delegation rights and system messages, agents (LLMs) will follow them, but the more complex the task you are asking of each agent individually the more likely it is it will fulfill it only partly or completely wrong.
The key observation is that by using the swarm method and distributing tasks across several agents, you can reduce the complexity load on each individual agent, while enabling the system as a whole to handle more complex tasks.
Example:
Easy: 1. Customer facing orchestrating agent, who can delegate to 2. Order lookup agent and 3. Refunds agent. Each agent has a relatively simple task (depending on the system they are using to lookup and refund).
Hard: 1. Orchestrating agent chatting with user about anything, handing over to 2. Presentation planning agent, who can create images, use code in kernel to create charts, read documents, search web, etc. to prepare presentation materials, then handing over to 3. Presentation creator who has to take all this in and create a JSON skeleton that is sent to python-pptx code.
I tried to create the latter with average results. The system worked autonomously. I merely gave the app a wikipedia link and asked it to create complete presentation using my powerpoint template including charts and images, but the load was still too large in the end on agent 3. who often created malformed slides skeleton.
This is also where you are hitting limits of the structure. If you are trying to solve anything relatively complex, you will inevitably get to a point where you put too much load on one of the agents.
In my example, the agent 3. was clearly struggling .. she was the end of the delegation flow seeing all the work of previous agents, the slides plan, the vast text from documents the agents reviewed in order to prepare them, etc. .. her task was to transform the plan to a JSON structure according to certain rules, but this proved too hard. By the time agent 3 saw the tsunami of tasks incoming, it was probably contemplating suicide and I don’t blame her.
This is where you either move to an even better model or do something clever about the structure, e.g. bypassing agent’s 3. response to a controlling review automatically before returning control back to her or using something like step back prompting.
Overall, I really liked Swarm as a simple method of creating agentic worflows, which it must be said, can be made much more complex and custom by each developer adding deterministic loops, rich context and other bypassing mechanisms to improve results. As compared to, e.g. langchain, I liked the openness of the method.
*the original repo called ‘Orchestrating Agents: Routines and Handoffs’ available here