Sitemap

Real AI agents vs. Automated workflows

6 min read1 day ago
Visual editor in n8n showing a flow-based interface for building automated workflows

I spent a couple of days reading about AI agents and trying out some of the tools, like OpenAI’s Operator, Amazon Bedrock Agents, Google Vertex AI, Zapier, and even n8n.

AI agents are changing how we work. Some are built into the tools we already use, helping with specific tasks, others run fully on their own. Everyone seems to have their own take on what an AI agent should be, and because of that, innovation is booming. But, in my opinion, just because something connects to apps and moves data around doesn’t make it an AI agent.

A lot of the tools labelled as “AI Agent Builders” offer automation, not autonomy. In other words, they connect different apps and services, and automate tasks across various platforms. For example:

  • n8n https://n8n.io/ai/
    n8n is a workflow automation tool that lets you connect apps and create logic-based flows using a visual editor. It allows users to integrage services like OpenAI, but it’s not a true AI agent builder. It doesn’t support long-term memory (learning from past behavior), reinforcement learning, or the ability to adapt over time.
  • Zapier https://zapier.com/agents
    Zapier is an automation platform for connecting popular apps like Gmail, Slack, and Google Sheets. Like n8n, it’s useful for building automated workflows, but it doesn’t enable the creation of intelligent AI agents. It lacks long-term memory, learning capabilities, complex reasoning, and multi-step decision-making.
  • Make https://www.make.com/
    Make is a visual automation builder designed to handle complex workflows across multiple apps. It can create “agent-like” flows when combined with services like OpenAI, but it isn’t a true AI agent builder either.

On the other hand, OpenAI’s Operator is a real AI agent, a goal-driven, multi-modal assistant that can use a computer the way a human would: clicking, typing, browsing, navigating apps, and making decisions on its own.

So, what exactly is a real AI agent? Here’s what it should be able to do:

  1. Understand goals, not just fixed instructions
    Example: instead of telling your agent: “Open Google Calendar, create a new event, invite Mark, set it for 3 PM,” you say: “Set up a meeting with Mark tomorrow before 3 PM, but only if he has questions about the report I sent him.” This requires Generative AI combined with planning algorithms.
  2. Decide what to do next
    Example: a user asks your chatbot a question it doesn’t know the answer to and instead of immediately escalating to support, the agent decides: Should I ask a follow-up question? Search internal docs? Try the web? Or escalate now? This step needs decision-making capabilities via reinforcement learning.
  3. Handle unexpected scenarios
    Example: an agent tries to schedule a meeting but one person’s calendar is blocked. Instead of failing, it checks for nearby open slots, suggests rescheduling, or asks if another participant can attend on their behalf. True agents need reasoning or probabilistic thinking to deal with uncertainty. This might involve Bayesian networks, graph-based logic, or LLMs.
  4. Learn and adapt based on context
    Example: you create a sales assistant agent that helps write outreach emails. At first, it uses a generic template. But over time, it notices that short, casual messages get better response rates, so it starts writing shorter emails, adjusting tone, and even choosing subject lines that worked best before. This is where machine learning, especially deep learning, comes in.

Some startups are on the right path. They’re combining flowcharts with GenAI to make sense of data, transform it, and make decisions based on the output. Some are building general-purpose agents that can browse the web and get things done on their own. Others are helping businesses train agents to handle specific tasks on their websites or apps.

There are a few different ways to approach this new technology, each with its own use cases and challenges. Here’s a breakdown of six options I came across while learning about AI agents:

1. Agent that learns how to browse

This type of agent works by visually analysing websites, taking screenshots, and learning how to detect buttons, forms, and other interactive elements. Instead of relying on structured APIs, it mimics how a human would browse, clicking, scrolling, and typing based on what it sees. It can be used to automate repetitive tasks across sites that don’t offer integrations, like booking flights, filling out forms, or checking prices. The challenge is making it reliable, since even small UI changes can break the workflow.

OpenAI’s Operator uses a new model called Computer-Using Agent (CUA) that combines GPT-4o’s vision capabilities with reinforcement learning to interact with web pages.

2. Site owner creates and trains their own agent

In this approach, the site owner builds and trains an agent specifically for their platform. They define what the agent can do, where it lives on the site, and how it should respond to users. It could help with things like customer support, onboarding, or handling common tasks like order tracking or password resets. The main value here is control. Owners decide what the agent knows, how it behaves, and what it can access.

Salesforce’s Agentforce, for example, is built natively into the Salesforce platform. It’s essentially a set of autonomous AI agents that connect directly to Salesforce data and perform real work. They can be embedded into Slack, mobile apps, websites, and flow processes.

3. Agent with direct UI and API access

This approach combines UI interaction with direct API access. The agent uses APIs whenever possible to get reliable, structured data, but falls back to UI actions when no API is available. For example, it might book a flight using an airline’s API, then log into a hotel site and fill out the booking form manually. This makes the agent more powerful and flexible, able to complete full workflows across different systems. It’s a bit more complex to build, but you get the best of both worlds.

4. Embedded agents inside existing apps

This type of agent lives inside an existing app and helps users complete tasks directly within the product. It can guide users through onboarding, automate repetitive actions, suggest next steps, or answer questions in context. Because it’s embedded, it has full access to the app’s data, interface, and user state, making it faster and more reliable than external agents.

5. Agent orchestration platform

This approach focuses on building a system that manages multiple specialised agents working together. Instead of having one agent try to do everything, each agent handles a specific part of the workflow, like one pulling data, another writing content, and another posting it online. The platform acts like an orchestra director, coordinating who does what and when. This makes it easier to handle complex tasks that involve multiple steps or tools. It’s useful for things like marketing automation, content creation, and data workflows, where one agent alone wouldn’t be enough.

For a real-world example of Agent Orchestration, check out Amazon Bedrock Agents.

6. Agent-friendly site protocol

This idea is about creating a standard way for apps to expose tasks and actions that agents can understand and interact with. Instead of relying on agents to guess what to do by analysing the UI, sites would follow a protocol that tells agents what’s possible and how to do it. It could include things like task definitions, allowed actions, and data inputs. This would make agents more reliable, faster, and easier to build. The challenge is adoption. Sites would need to support the protocol for it to work, but if they do, it could change how agents interact with apps entirely.

Anthropic’s Model Context Protocol (MCP) and Google’s Agent-to-Agent Protocol (A2A) are both designed to facilitate communication and collaboration within the AI agent ecosystem.

What’s Next?

Sam Altman from OpenAI reckons AI agents like Operator are now like junior employees. They can take on complex tasks and might even help discover new knowledge soon.

OpenAI and Google have adopted Anthropic Model Context Protocol (MCP), an open standard that lets AI agents securely access and interact with various apps and data sources, making it easier for AI agents to perform tasks across different platforms.

Google DeepMind’s AlphaEvolve is an AI agent that designs and improves algorithms on its own. It’s been used to optimise data centre operations and even solve complex mathematical problems.

Microsoft’s Copilot is also stepping up, moving from just assisting with code to actually performing tasks, thanks to new agent modes.

Salesforce’s new study on the unreliability of agents further reinforced what Apple has been saying about the limits of reasoning in LLM-based approaches.

Amazon is using AI agents in its warehouses to do things like unloading trailers and fetching parts. They’re also developing smart glasses for delivery drivers, offering real-time navigation help.

--

--

Federico Cargnelutti
Federico Cargnelutti

Written by Federico Cargnelutti

0 followers

Product and tech leader with a strong engineering background.

No responses yet