True agents in LLM agentic workflows

Lessons from Agent-based models for LLM-based agentic systems.

Last updated on Jul 25, 2024 12 min read

There is a lot of buzz, chatter, and excitement around the creation and use of “agentic” workflows with Large Language Models (LLMs). I am not totally certain where the idea of “agent” came about. And often enough it isn’t totally clear what the word agent means in the context of these workflows.

Be that as it may, there is definitely a precise concept of agent, especially when we look at agent-based models. I have written and researched them for a while. But I don’t want to focus on any particular model but want to talk about the notion of an agent and what it represents.

The concept of Agent in Agent-based models

The premise of an agent-based model of a complex system is that we start at a sufficiently local level, identify the key actors or agents in the system, simulate the behavior and interactions of these agents, and then observe the emergent behavior of the system.

When we talk about an agent within an agent-based model, we have the notion of an actor, a simulated entity, that captures some aspects of a real-world entity that you wish to model. Most importantly, what you have is a way of interacting with the environment and other agents. Generally, this is encoded as computable functions of the state of the environment, the current state of the agent, and the state of other agents. You can then evolve the system in time and compute interesting and relevant metrics about the system – both microscopic and macroscopic. For a non-technical description, read this post.

There is a more generalized idea about purely LLM driven agent-based models that I have written about here. But I want to stick with traditional ABMs for now.

Formalizing agent-based models.

We can formally write the key components of any agent-based model as follows:

Environment State: Let $E$ be the set of all possible environment states. $E = {e_{1}, e_{2}, \dots, e_{n}} .$
Set of Agents: Let $A$ be the set of all agents in the system. $A = {a_{1}, a_{2}, \dots a_{M}} .$
Action Space: Let C be the set of all possible actions the agent can take. $C = {c_{1}, c_{2}, \dots c_{k}} .$
Agent internal state: For agent $a \in A$ , let $S^{a}$ be the set of all its possible internal states $S^{a} = {s_{1}^{a}, s_{2}^{a}, \dots s_{n}^{a}} .$
Agent Perception: This encodes the various parameters that the agents possesses to interact with the environment. We denote this by $P^{a}$ .
Agent Perception Function: This function defines how the agent perceives the environment. Typically, you would define it as a coupling function between the agent $a$ and the environment state $e$ . This could be different for different agents. We have $F^{a} : E \to P^{a}$
Agent Decision Function: This function determines what action an agent takes based on its internal state and perception. $D^{a} : S^{a} \times P^{a} \to A .$
State Update Function: This function updates the internal state of the agent based on its current state, perception, and action. $U^{a} : S^{a} \times P^{a} \times A \to S^{a}$
Environment Update Function: This function encodes how we update the environment based on its current state and the action of all the ( $M$ ) agents. $T : E \times A \times C^{M} \to E .$

I have written these functions in a very suggestive manner. This way of writing and abstractly defining the components of any generic agent-based model lends itself to a type system very well. As an aside, I finally got to appreciate the powers of a type system following this wonderful talk by Scott Wlaschin. So we can define a type system in F# as follows:

type EnvironmentState = EnvironmentState of obj
type AgentId = AgentId of int
type AgentInternalState = AgentInternalState of obj
type Action = Action of obj
type AgentPerception = AgentPerception of obj

These types define the core objects of the agent-based model. We then need some transition functions that encode the dynamics. Using the formalism above, we can write them as:

type AgentPerceptionFunction = EnvironmentState -> AgentPerception
type AgentDecisionFunction = AgentInternalState * AgentPerception -> Action
type AgentStateUpdateFunction = AgentInternalState * AgentPerception * Action -> AgentInternalState
type EnvironmentUpdateFunction = EnvironmentState * Map<AgentId, Action> -> EnvironmentState

Using these functions and the initial types, we can now define the core objects/records of the agent-based model.

type Agent = {
    Id: AgentId;
    InternalState: AgentInternalState;
    PerceptionFunction: AgentPerceptionFunction;
    DecisionFunction: AgentDecisionFunction;
    StateUpdateFunction: AgentStateUpdateFunction;
}

type ABMSystem = {
    EnvironmentState: EnvironmentState;
    Agents: Map<AgentId, Agent>;
    EnvironmentUpdateFunction: EnvironmentUpdateFunction;
}

Finally, the dynamics are fairly straightforward. We start with the environment in its current state, update the agent states based on the perception function and the decision function, and then update the environment based on the actions of the agents.

let simulateABMStep (system: ABMSystem): ABMSystem =
    let agentActions =
        system.Agents
        |> Map.map (fun (_ agent) ->
            let perception = agent.PerceptionFunction system.EnvironmentState
            let action = agent.DecisionFunction (agent.InternalState, perception)
            let newInternalState = agent.StateUpdateFunction 
                 (agent.InternalState, perception, action)
            (action, { agent with InternalState = newInternalState 
            }))
   let newEnvironment = system.EnvironmentUpdateFunction(system.EnvironmentState,
                          Map.map(fun _ (action, _) -> action) agentActions)
   let newAgents = Map.map(fun _ (_, agent) -> agent) agentActions
   { system with Environment = newEnvironment; Agents = newAgents }

And that’s it. Within this type system, we have all the ingredients to define any agent-based model.

Towards truly Agentic LLM workflows

What has this formalism brought us? I think it allows for a little more structure to how we can now define agentic systems that utilize LLMs. If we step away from LLMs for a second, nothing in the formalism above imposes a particular way of implementing the agent’s decision function, or the internal state of the agents. Most importantly, the environment is totally generic. What is useful for studying the Schelling Model or the macro-economy, works equally well for constructing an LLM-based workflow.

The key notion that underlies almost all agent-based models is this idea that agents can perceive the environment, and then based on this perception make decisions to change the environment. If they aren’t able to update the environment, at least they can update their internal state.

The current wave of LLM agentic workflows tries to capture this idea a bit. However, I think the prescription for these workflows is pretty open-ended. What the formalism permits us is for deeper thinking around the questions:

What is the environment that my agent is operating in?
What are the possible states that any agent can have?
What is the perception function?
How is the agent going to decide what next steps to take?

The question of Autonomy.

While discussing the notion of agents, one thing that isn’t talked about enough is the core idea of agency itself. Now, I am not implying that LLMs, as powerful as they are, are somehow sentient and have agency. What I have in mind is more along the lines of letting agents be autonomous and figure things out on their own. In many agentic workflows, we (or the user/developer) has a particular goal in mind, and the LLM-based system is a means to that goal.

However, many of the designs I have seen for these agentic systems look more like specific services (dare I say “micro-services”) with an LLM producing some output that can be passed onto the next service.

We can surely be more creative. One of the simplest applications of open-ended exploration using LLM agents is software testing. We can have an LLM agent that is tasked with finding bugs in a piece of software. Or we can have the LLM agent navigate a UI, and find edge-cases that a human tester might not have thought of.

To test this idea a bit, let us modify the agent-based model type system to an agentic UI type system. One of the differences between an agent-based model and an agentic UI is that we can endow the agentic UI with open-ended “knowledge”. Unlike computable ABMs that require memory or experience of past events/actions be parametrized, an LLM agent can simply have a knowledge base that is continuously updated. Lets look at the type system for such an agentic UI.

type UIState = UIState of obj
type UIAction = UIAction of obj
type Usefulness = Usefulness of float

We have added an extra parameter Usefulness that will be used to evaluate the usefulness of an action. Because we want our agent to explore, the agent needs to keep track of whether the action taken was useful or not. If its not useful, then the agent can backtrack and try something else.

We add two more types that form the core of the knowledge base of the agent. The first one is the Experience that stores a notion of the state, the action taken, the resulting state, and the usefulness of the action. The word is deliberately chosen because I really want to capture the idea that agent is experiencing the environment, and then updating its knowledge about the environment. The second type is of course the KnowledgeBase that is a set of experiences.

type Experience = {
    InitialState: UIState
    Action: UIAction
    FinalState: UIState
    Usefulness: Usefulness
}

type KnowledgeBase = Experience list

Finally, we can define the core functions that will drive the agentic UI.

type StateTransitionFunction = UIState * UIAction -> UIState
type ActionSelectionFunction = UIState * KnowledgeBase -> UIAction
type UsefulnessEvaluationFunction = UIState * UIAction * UIState -> Usefulness
type KnowledgeUpdateFunction = KnowledgeBase * Experience -> KnowledgeBase

type AgenticUISystem = {
    CurrentState: UIState
    Knowledge: KnowledgeBase
    StateTransitionFunction: StateTransitionFunction
    ActionSelectionFunction: ActionSelectionFunction
    UsefulnessEvaluationFunction: UsefulnessEvaluationFunction
    KnowledgeUpdateFunction: KnowledgeUpdateFunction
}

If we wanted to formalize this, it would be as follows:

Agent’s Knowledge Base: $K$ , a set of experiences $(s, a, s^{'}, u)$ where $s$ is the initial state, $a$ is the action taken, $s^{'}$ is the resulting state, and $u$ is the perceived usefulness of the action.
Action Selection Function: This function determines which action the agent will take given the current state and its knowledge base. $π : S \times K \to A$ $π (s, K) = a$
Usefulness Evaluation Function: This function evaluates the usefulness of an action given the initial state, the action taken, and the resulting state. $U : S \times A \times S \to R$ $U (s, a, s^{'}) = u .$
Knowledge Update Function: This function updates the knowledge base with new experiences. $Φ : K \times (s, a, s^{'}, u) \to K,$ $K^{'} = Φ (K, (s, a, s^{'}, u)) .$

UI Navigator: A toy example

What does a toy model of this exploration agent look like? We can create a mock UI that has a particular flow. We can think of a simple sign-up flow on a website as a basic toy-model. However, I am not adept at building UI, but I still want to give an idea of what I have in mind. Here is what I could mock-up in a bit of TypeScript, with all of the UI elements subbed for simple strings.

We start with the following basic types.

type UIState = string;
type Action = string;
type TaskDescription = string;
type ActionEvaluation = {
    action: Action;
    state: UIState;
    usefulness: number;
};

Then, we create a SelfImprovingUINavigator class that stores the variables and the functions. The UI is simulated as a map of strings to possible actions, also stored as strings.

class SelfImprovingUINavigator{
  private openai: OpenAI;
  private currentState: UIState;
  private taskDescription: TaskDescription;
  private actionHistory: Action[] = [];
  private knowledgeBase: ActionEvaluation[] = [];
  private simulatedUI: Map<string, string[]>; // For simplicity, we'll simulate the UI as a map of states to possible actions

// Define within the constructor, a simulated UI such as this 

  this.simulatedUI = new Map([
    ['Home Page', ['Click Sign Up', 'Enter Username', 'Enter Password']],
    ['Sign Up Page', ['Enter Email', 'Enter Phone Number', 'Submit']],
    ['Confirmation Page', ['Click Verify Email', 'Enter Verification Code', 'Submit']],
    ['Account Created Page', ['Click Continue to Dashboard']]
  ]);

The rest of the code defines relevant functions for deciding the next action, performing the action, and updating the knowledge base. Nothing too fancy but I hope it gets the point across. Crucially, we want to be able to prompt the LLM to decide what action to take next, and then perform that action. decideNextAction looks something like this. You can find the complete code here

private async decideNextAction(): Promise<Action> {
        const MAX_EXPERIENCE = 5; // Limit of cognition as it were

        const relevantExperiences = this.getRelevantExperiences(this.currentState).slice(0, MAX_EXPERIENCE);
        const availableActions = this.simulatedUI.get(this.currentState) || [];

        const prompt = `
            Task: ${this.taskDescription}
            Current UI State: ${this.currentState}
            Action History: ${this.actionHistory.join(", ")}
            Relevant Past Experiences: ${JSON.stringify(relevantExperiences)}
            
            Based on the current UI state, your understanding of typical web interfaces and the relevant past experiences, what action should be taken next to progress towards completing the task? Respond with a specific action description.
            `;
        const response = await this.openai.chat.completions.create({
            model: 'gpt-4o-mini',
            messages:[ {role: 'user', content: prompt}],
            max_tokens: 50,
        });
        const chosenAction: string = response.choices[0].message.content?.trim() || '';
        return availableActions.includes(chosenAction) ? chosenAction : availableActions[0];
    }

Conclusion

The idea of agents in LLM-based workflows is still in its infancy. I think there is a lot of potential for creating truly autonomous agents that can explore and learn about their environment. The formalism of agent-based models can provide a good starting point for thinking about how to structure these agentic systems.

I wanted to introduce ABMs as a guide post and as a forms of structuring our thinking of what an agentic system is or isn’t. I believe the use of LLMs actually takes us beyond the framework of computable ABMs (as opposed to embodied agents).

The advent of Large Language Models (LLMs) has indeed opened up unprecedented possibilities for creating truly autonomous and adaptive systems. However, the current trend of “LLM agentic workflows” often falls short of realizing this potential, reducing the concept of an agent to a mere input-output service. To truly harness the power of LLMs in creating agentic systems, we must embrace a more comprehensive understanding of what an “agent” can be. An agent, in its fullest sense, is not just a passive responder but an active participant in its environment. It is:

A Learner: Continuously updating its knowledge base and strategies based on experiences.
An Explorer: Actively probing its environment and trying new approaches to solve problems.
A Navigator: Capable of traversing complex state spaces, whether digital interfaces or abstract problem domains.

By reimagining LLM-based systems through this lens of true agency, we open doors to AI that can adapt, improve, and potentially even surprise us with emergent behavior. The UI navigator we’ve explored in this post is just one example of how we can start to build systems that embody these principles.

The future of AI lies not in more sophisticated chatbots, but in genuine artificial agents that learn, explore, and navigate our increasingly complex digital landscapes. It’s time we embraced the word “agent” in its full, rich meaning, and create AI agents worthy of the name - autonomous, adaptive, and truly agentic.

Edit this page