- AI agents differ from plain LLM apps by owning the control flow, combining models, tools, memory and clear goals.
- Protocols like MCP, A2A and NLWeb standardise how agents access tools, collaborate and interact with the web.
- Robust agents rely on good model choice, well-defined tools, precise instructions, orchestration patterns and guardrails.
- Modern frameworks and clouds, combined with these protocols, enable scalable multi-agent ecosystems in real products.

AI agents are moving software from passive assistants to autonomous collaborators that can perceive their environment, reason about complex goals and take actions on our behalf. For developers, this shift changes everything: instead of wiring static workflows around an LLM, you design systems where the model itself drives the control flow, orchestrates tools, and cooperates with other agents and services.
If you want to build serious, production-grade agentic systems, understanding emerging protocols is no longer optional. Standardised ways for agents to access tools (MCP), talk to each other (A2A) and interact with the web via natural language (NLWeb) are quickly becoming the backbone of the “agent ecosystem”. In parallel, you still need to master the core building blocks of agents themselves: models, tools, instructions, orchestration patterns and guardrails.
What exactly is an AI agent and how is it different from a plain LLM?
An AI agent is best understood as a complete system built around an LLM, not just the model itself. The academically accepted definition (for example in Stanford CS221) describes an agent as a computational entity situated in an environment, capable of perceiving it through sensors and acting on it through actuators to maximise the chances of success with respect to some goal.
In practical software terms, modern AI agents combine four ingredients: a large language model for reasoning, access to external tools and APIs, some form of memory to track context over time, and a clearly defined objective or role. Unlike a simple chatbot that just answers questions, an agent can plan, call tools, react to their outputs and iteratively drive a workflow until a goal is reached.
A common source of confusion is mixing up “model” and “agent”. A model like GPT‑4 or Llama 3 is a powerful but passive “brain”: it does nothing until you send it a prompt, and it cannot by itself send emails, hit APIs or update databases. An agent, on the other hand, wraps the model in a loop of perception, reasoning and action. It uses the model’s predictions to choose which tool to call, when to ask the user for clarification, and when to stop.
The key difference is who controls the workflow. In classic software, your code dictates the sequence: if A then B then C. In an agent, the LLM decides what the next step should be based on the current state. It might choose to look up an order, open a support ticket, or hand off the case to another agent, all from the same high-level request.
Agents also vary in sophistication, from simple reactive systems to learning, goal-driven architectures. The classical taxonomy from Russell and Norvig is still useful to understand the landscape: you get simple reactive agents (pure if-then rules), model-based reactive agents (with a minimal internal state), goal-based agents (that plan towards a desired outcome), utility-based agents (that optimise a numerical score over many possible outcomes) and learning agents (that adapt their policy based on feedback).
Why protocols matter in the era of AI agents
As agents become more capable and widespread, three problems quickly show up: integration cost, interoperability and security. Ad-hoc glue code for every API or partner system does not scale. Proprietary, one-off formats block collaboration between tools and agents from different vendors. And every new integration increases your security surface.
Agent-focused protocols aim to solve exactly these pain points by defining open standards for: how hosts expose tools and context to LLMs (Model Context Protocol, or MCP), how agents talk to other agents across organisational and technical boundaries (Agent-to-Agent, or A2A), and how websites expose their content and actions in a natural-language-first way for both humans and agents (Natural Language Web, or NLWeb).
For developers, these protocols behave like “universal adapters” and “business cards” for agents and services. Instead of hardcoding dozens of integrations, you integrate once with MCP servers, A2A-compatible peers or NLWeb sites, and let the protocol handle discovery, capabilities and authentication. This dramatically reduces custom integration logic and lets you switch models or tools without rewriting all the plumbing.
At the same time, protocol-level security becomes essential. Access control, standardised authentication and clear capability descriptions at the protocol layer make it much easier to reason about who can do what, from where, and under which constraints—critical in enterprise settings where agents might be allowed to touch inventory, payments or sensitive customer data.
Model Context Protocol (MCP): a universal adapter for tools and data
The Model Context Protocol is an open standard that defines how applications can provide tools and contextual data to LLM-based agents. Conceptually, MCP sits between your agents and your existing systems—databases, SaaS APIs, internal services—and turns them into a unified, discoverable set of capabilities.
MCP follows a client-server architecture with three main roles: the host (an LLM application such as an IDE, a chat client or an agent runtime) that initiates connections, the client components inside that host that maintain one-to-one connections to MCP servers, and the servers themselves, which are lightweight programs exposing specific capabilities.
Inside MCP, servers advertise three core primitives that agents can use in a consistent way: tools, resources and prompts. Tools are discrete actions—“get_weather”, “purchase_product”, “search_flights”—with names, descriptions and input/output schemas. Resources are read-only data items such as files, database rows, or logs, which can be text or binary. Prompts are predefined templates that encapsulate prompt engineering patterns or multi-step flows.
Dynamic tool discovery is one of MCP’s biggest wins. Instead of hardcoding that a travel assistant has a “searchFlights” function with a specific signature, the agent connects to the airline’s MCP server and asks for its capability list. The server returns machine-readable descriptions of tools, their arguments and expected responses. When the airline adds a “upgrade_booking” tool, your agent discovers it without code changes, as long as you respect the MCP contract.
MCP is also deliberately model-agnostic. Because the protocol is about capabilities and context, not about any one vendor’s API, the same MCP server can be used from different LLMs or agent frameworks. This allows you to experiment with model swaps or multi-model strategies (e.g. using a small, cheap model for simple flows and a powerful one for complex reasoning) without redoing your integrations.
Another benefit is standardised security. MCP can include consistent authentication mechanisms, which is far more maintainable than juggling a zoo of bespoke auth flows for every third-party API. For enterprises, this means cleaner scaling from “one integration in staging” to “hundreds of MCP servers in production” without losing control over keys and permissions.
A concrete example makes MCP’s role clearer: imagine a user asking an AI travel assistant to “find me a flight from Portland to Honolulu and book it”. The assistant, acting as MCP client, connects to the airline’s MCP server, enumerates tools like “search_flights” and “book_flight”, invokes “search_flights” with the right parameters, receives the JSON results, presents them to the user, and then calls “book_flight” based on the chosen option. The assistant never calls the airline’s internal APIs directly; it simply speaks MCP.
Agent-to-Agent (A2A): a protocol for multi-agent collaboration
While MCP focuses on connecting agents to tools and data, the Agent-to-Agent protocol is about connecting agents to each other. As soon as you move beyond a monolithic “super-agent” into an ecosystem of specialised agents (travel, billing, logistics, support…), you need a clean way for them to discover each other, exchange context and collaborate on shared tasks.
A2A is designed to support this kind of distributed, cross-organisation orchestration. It allows agents from different companies, stacks and hosting environments to work together on a user’s request without hardwiring every interaction path in advance. An A2A-compatible “Travel Agent” can call an “Airline Agent”, “Hotel Agent” and “Car Rental Agent” built by completely different teams.
Every A2A agent exposes a machine-readable Agent Card that plays a role similar to MCP’s capability listing, but at the agent level rather than the tool level. An Agent Card contains the agent’s name, a natural-language description of what it handles, a list of skills with explanations of when to call it, its current endpoint URL, version information and flags like whether it supports streaming responses or push notifications.
On the caller side, an Agent Executor is responsible for handing off context and managing the interaction. When a local agent decides to delegate a subtask, its executor packages the current conversation, relevant state and any constraints, and sends them to the remote agent over A2A. The remote agent runs its own internal tools and LLM loop, then returns the outcome without the caller having to know its internals.
The result of a completed remote task is returned as an artefact. An artefact typically bundles the task output, a brief description of what was done, and the textual context that flowed through the protocol. Once the artefact is delivered, the A2A connection can close, keeping each interaction scoped and cheap while still allowing rich cooperation.
For long-running or asynchronous tasks, A2A often relies on an event queue. Instead of keeping connections open for minutes while a remote agent crunches data or waits on external systems, the event queue handles message passing and updates. This is especially important in production-grade multi-agent systems where network resilience, retries and backpressure matter.
The benefits of A2A mirror those of MCP but at the ecosystem level. You get improved collaboration between heterogeneous agents, flexibility to choose the best LLM or fine-tuning strategy per agent, and built-in authentication so that inter-agent calls are secure and auditable. It becomes realistic to build “teams of agents” spanning multiple vendors rather than trying to cram every capability into a single monolith.
Natural Language Web (NLWeb): making the web agent-friendly
The web was built around documents and HTML, not around conversations and agents. Users have long navigated menus and search boxes to extract information from websites, while automated access typically relied on brittle scraping or custom APIs. NLWeb proposes a different model: websites that natively speak natural language, for both humans and AI agents.
An NLWeb deployment revolves around a central NLWeb application—the core service code that receives natural-language questions, connects to storage and models, and returns structured answers. You can think of it as the “language engine” of your site, orchestrating embeddings, vector search and LLM reasoning.
The NLWeb protocol itself defines the basic rules for this natural-language interaction. It standardises how questions are sent and how answers come back, typically as JSON formatted using vocabularies like Schema.org. In the same way HTML standardised document sharing, NLWeb aims to standardise language-driven access to site content and actions, paving the way for an “AI web”.
Every NLWeb instance also acts as an MCP server. That means it can expose tools (like an “ask” method) and data resources to external AI systems over MCP. From an agent’s perspective, your site becomes just another MCP endpoint: it can call “ask” with a question, receive a structured response tied to real entries in your catalogue, and avoid hallucinating nonexistent products or pages.
Under the hood, NLWeb leans heavily on embedding models and vector databases. When you ingest your site content—product listings, hotel descriptions, blog posts—NLWeb turns them into vector embeddings and stores them in a compatible vector store such as Qdrant, Milvus, Azure AI Search, Snowflake or Elasticsearch. At query time, it retrieves the most similar items and passes them, along with the user’s question, to an LLM to craft an answer grounded in actual content.
A travel booking site is a great example of NLWeb in action. You ingest structured data for flights, hotels and packages (ideally using Schema.org or RSS feeds), create embeddings and store them. When a user types “find me a family-friendly hotel in Honolulu with a pool next week” into a chat box, NLWeb queries the vector store for relevant hotels, lets the LLM interpret “family-friendly” and other soft constraints, and returns a natural-language answer backed by real inventory. The same NLWeb instance, via its MCP interface, lets an external travel agent ask, for instance, about vegan restaurants near those hotels and get back consistent, machine-usable JSON.
When it makes sense to build an AI agent at all
Not every problem needs an agent; sometimes a simple deterministic service is better. Agents shine when the workflow can’t easily be captured as a rigid set of rules, when there’s heavy reliance on unstructured data, or when the number of exceptions and edge cases makes rule maintenance painful.
Three families of use cases are especially well suited to agents: complex decision-making (for example, deciding whether to approve a customer refund under nuanced policies), rule sets that are hard to maintain (like complex vendor security reviews or compliance checks), and flows dominated by natural language (claims processing, free-form customer requests, research tasks).
A useful heuristic is to look at systems that have grown via endless patches and special-case rules. If even senior engineers struggle to predict behaviour or to encode new policy changes without breaking something else, chances are the underlying problem is semantic, not purely logical. That’s the perfect territory for an LLM-driven agent that can reason over text, policies and examples.
By contrast, for highly deterministic tasks with clear inputs and outputs, classical code will typically be cheaper, faster and more reliable. If your job is “convert this number to another format” or “run this SQL query and return rows”, adding an agent loop on top is probably unnecessary complexity.
The core building blocks of an AI agent
Despite the hype, the internal structure of a well-designed agent is quite straightforward. Almost all patterns boil down to three pillars: the model that does the reasoning, the tools that connect to the outside world, and the instructions that constrain and guide behaviour.
The model is the decision engine. Different LLMs trade off reasoning quality, latency and cost. A common and pragmatic strategy is: start with a highly capable model to establish a quality baseline and understand what “good” looks like in your domain, then progressively test smaller or cheaper models for sub-tasks such as classification or retrieval where peak reasoning isn’t required.
Tools extend the agent beyond pure text. They are functions, APIs, or services the agent can call: querying a database, sending an email, searching the web, interacting with a legacy UI through a computer-use model, and so on. Well-designed tools are documented, reusable across agents and ideally exposed via standard protocols like MCP.
Instructions are the most underrated part of an agent. You need more than “be helpful”. High-quality instructions describe how to decompose tasks, how to behave when information is missing, which tools to prefer in which situations, what counts as success, and what to avoid. Many teams successfully repurpose existing SOPs, help centre docs or internal playbooks by converting them into LLM-friendly, numbered guidelines the model can follow.
It’s increasingly common to generate or refine instructions automatically using LLMs themselves. For example, you can feed a help centre article into a meta-prompt that asks the model to rewrite it as a clear, numbered set of agent instructions, including explicit handling of edge cases. This keeps behaviour aligned with your documentation as it evolves.
Orchestration patterns: single-agent vs multi-agent systems
Under the hood, agents execute in a loop: observe the current state, decide what to do, act (often via a tool), update the context, and repeat until a stop condition is met (goal achieved, error, user intervention, or guardrail trip). This “agent loop” is what turns a one-shot LLM call into an ongoing workflow engine.
The simplest architecture is a single agent with tools. It receives user messages, reasons about them, decides which tools to call, and returns answers. Frameworks often expose a runner component that keeps calling the model until some termination criterion is satisfied—like “no more useful tool calls” or “structured output has been produced”. This pattern is ideal for early versions and for well-scoped problems.
As complexity grows, teams often move to multi-agent topologies. There are two main flavours. In a manager pattern, a central “orchestrator” agent delegates subtasks to specialised agents exposed as tools—say, translators into different languages, a research agent and a critic. The manager keeps global control and stitches everything together.
The second pattern is more decentralised. Here, agents hand off work to peers when they detect that a request falls outside their domain. A triage agent could route customer messages to technical support, sales or order management agents, each with its own instructions and tools. The flow of control hops between agents without a single central planner.
Both patterns can combine naturally with A2A at larger scale. Inside a product or microservice boundary you might use an orchestrator-plus-specialists model, while across companies or departments you rely on A2A to talk to externally owned agents that advertise their capabilities through Agent Cards.
Guardrails: keeping autonomous agents safe and reliable
Giving agents autonomy also means accepting new risks: they might leak sensitive data, make unauthorised changes, or take actions with financial or reputational impact. Guardrails are the protective layer that manages these risks without neutering the agent’s usefulness.
Defensive design usually involves multiple layers of guardrails. Some operate on inputs (blocking or sanitising malicious or out-of-scope requests), some on intermediate model decisions (checking whether an action is allowed before executing it), and some on outputs (filtering for safety, compliance or data leakage before responses leave the system).
In many implementations, guardrails run “in parallel” to the agent’s optimistic progress. The agent loop moves forward, but specific steps—like a tool call that may edit data—are wrapped in guardrail checks. If the guardrail detects a violation, it can stop the action, raise an exception, or escalate to a human operator.
Some guardrails are themselves driven by LLMs focused on limits and risks or even agents. For example, you might maintain a dedicated churn-detection agent that evaluates incoming customer messages and flags those indicating a high risk of cancellation. A higher-level guardrail then uses this signal to trigger retention workflows or require mandatory human review before closing the interaction.
Operational guardrails also include hard limits and escape hatches. Maximum step counts to avoid infinite loops, risk-based thresholds that force human approval for sensitive actions, and clear fallbacks when the model’s confidence is low all contribute to safe deployment in real-world environments.
From theory to practice: a stepwise design of an order-support agent
To ground these ideas, consider the evolution of an order-support system for an online shop. The initial version is typically just a reactive endpoint: given an order ID, fetch its status from the database and return it. There is no reasoning, no memory, and no workflow—this is not yet an agent.
The first agentic step is letting the model control the workflow. Instead of assuming the order ID is present, you feed the full conversation to the model and let it decide what to do. If the user asks “Where’s my package?” without providing an ID, the model can choose an “ASK_FOR_ORDER_ID” action and prompt the user for more information.
Next, you wrap this reasoning in a loop and introduce state. After every user message or tool call, the agent reevaluates the situation. It might fetch an order, update the context, check whether it has enough information to respond, or ask a follow-up question. The loop only stops once a clear response has been sent or a termination condition is reached.
As the scope widens beyond status checks, the agent starts selecting tools dynamically based on intent. A shipping issue might route to “open_incident”, a refund request to “initiate_refund”, and a simple status query to “get_order_status”. You don’t encode a fixed tree of if-else branches; instead, the model chooses actions from a menu of tools defined by you or discovered via MCP.
At this point you introduce guardrails and risk evaluation around sensitive tools. Read-only operations might be executed directly, but anything that changes state (issuing refunds, cancelling orders, modifying addresses) passes through a risk-aware guardrail. High-risk actions require human approval; medium-risk actions may trigger extra confirmations; low-risk actions can proceed automatically.
Finally, you set operational boundaries and human handoff rules. If the agent hits a maximum number of failed attempts, encounters contradictory information, or faces a high-risk decision outside its remit, it hands off to a human support agent with all the accumulated context. This hybrid approach lets you safely deploy autonomy while maintaining control over edge cases.
Advanced reasoning frameworks and modern agent tooling
On top of these architectural basics, advanced reasoning frameworks help LLMs behave more like deliberate agents than black-box oracles. Two popular patterns are Chain-of-Thought (CoT) and ReAct (Reason + Act).
Chain-of-Thought simply asks the model to think step-by-step, decomposing complex questions into intermediate reasoning steps before producing a final answer. Research shows this can significantly improve performance on reasoning-heavy tasks in larger models, and it maps naturally onto the agent loop: each tool call fits into a broader chain of reasoning.
ReAct tightly couples reasoning with tool use. The agent explicitly alternates between thoughts, actions and observations: it explains what it intends to do, calls a tool, examines the output, and updates its plan. This pattern underpins many early autonomous agent systems like AutoGPT and BabyAGI, which dynamically generate and reprioritise to-do lists towards a user goal.
Modern frameworks and SDKs wrap these ideas into developer-friendly abstractions. Libraries like LangChain, LangGraph, CrewAI or smaller “smolagents” style toolkits provide building blocks for tool calling, graph-based workflows, multi-agent orchestration and persistent memory. Many of these toolchains also include guidance for custom agents in VS Code. Proprietary platforms from cloud vendors and players like OpenAI add higher-level constructs for agents, guardrails and evaluations.
Importantly, these frameworks increasingly integrate with protocols like MCP, A2A and NLWeb. Instead of baking in one-off connectors, agents can plug into standardised capability layers, talk to external agents via Agent Cards, and treat NLWeb-enabled sites as first-class, natural-language APIs. This convergence between protocols and tooling is what enables large-scale, interoperable agent ecosystems.
All of this sits on a continuum from no-code to high-code solutions. Visual platforms in the no-code space let non-developers compose agent workflows and tools with drag-and-drop interfaces and natural-language configs. At the other end, high-code environments give engineers precise control over orchestration, evaluation and deployment, often combining frameworks with custom infrastructure on AWS, Azure or similar clouds.
Across this spectrum, the organisations that win are those that learn to architect agents, not just consume them. Understanding protocols, patterns and guardrails lets you go beyond “try a chatbot” experiments and towards robust, scalable automation: from internal analytics agents and developer copilots, all the way to multi-agent systems coordinating inventory, payments and customer experience in real time. As agents continue to mature, these design skills become a genuine competitive edge.
