AIAgentic AIPOPIASoftware

What 'Agentic' Actually Means (And Why Most Things Called Agentic Are Just Chatbots)

Troy Havenga 4 June 2026 6 min read

Abstract dark visual of one glowing orange node connected to muted grey nodes

A vendor sends you a deck. Every second slide says "agentic". The demo looks slick. Before you sign anything, you need one test that cuts through it, because the word has been stretched so far it now means almost nothing. Here is the test. A chatbot answers. Agentic software acts. The only question that matters is who decides the next step and who carries it out. If a person still has to read the reply, copy the number, open the system and update the record, you have bought a chatbot with a better vocabulary.

That is not a pedantic distinction. It is the gap between software that drafts an email and software that sends it, logs it against the customer, and books the follow-up. One saves you reading time. The other changes how the work moves. Most of what gets sold as agentic today sits in the first camp.

The numbers vendors hope you have not read

Gartner put a name to the problem in June 2025. It calls it agent washing: vendors rebranding existing assistants, robotic process automation and chatbots as agentic AI with no real autonomy underneath. Of the thousands of vendors claiming agentic solutions, Gartner reckons only around 130 are the real thing. In the same forecast it predicted that more than 40 percent of agentic AI projects will be cancelled by the end of 2027, on rising costs, unclear business value and weak risk controls. That forecast followed a poll of more than 3,400 organisations already weighing or funding the technology.

Gartner senior director analyst Anushree Verma was blunt. Most projects, she said, are early stage experiments or proofs of concept driven by hype and often misapplied, and today's models do not have the maturity and agency to autonomously achieve complex business goals or follow nuanced instructions over time. Plenty of use cases dressed up as agentic do not need an agent at all.

The performance data backs the caution. In 2025 Carnegie Mellon, working with Salesforce, built a simulated company called TheAgentCompany and staffed it with AI agents. On multi-step office tasks the best model, Gemini 2.5 Pro, finished 30.3 percent of them. Claude 3.7 Sonnet managed 26.3 percent, GPT-4o 8.6 percent and Amazon Nova Pro 1.7 percent. So the strongest agent failed roughly seven tasks in ten. Salesforce's own CRMArena-Pro benchmark told the same story: about 58 percent accuracy on single-step tasks, dropping to 35 percent once the work ran across several steps. Autonomy falls apart as the job gets longer.

Then there is MIT. Its 2025 report, The GenAI Divide, found that 95 percent of enterprise generative AI pilots produced no measurable financial return, against an estimated 30 to 40 billion dollars in spending. Only 5 percent reached production at all. The report is a preliminary one and the headline figure has drawn its share of argument, but the direction of travel is hard to wave away.

The failures are about where you point it, not the model

This is the part worth sitting with, because it is more useful than the doom. MIT's own read was that the barrier is not infrastructure, regulation or talent. It is learning. Buying from specialised vendors and integrating succeeded about 67 percent of the time. Internal hype-driven builds succeeded roughly a third as often. And the biggest returns came from back-office process work, not the sales and marketing tools that ate more than half the budgets.

Put plainly: the technology works when you aim it at a process with a provable return and measure it on outcomes. It fails when you bolt it onto a vanity dashboard to look modern. The model is rarely the problem. How you apply the autonomy is.

What a real agent actually does

Genuine agentic software runs a loop. The pattern comes from the ReAct paper out of Princeton and Google Research in October 2022. The model reasons, calls a tool, reads the result, then decides again, grounded in real data rather than its own text. On the original benchmarks, interleaving reasoning with tool use beat the baselines by 34 percent on one task and 10 percent on another, off one or two examples. So it reads data from your systems and events, decides inside set boundaries, and writes back to those systems, looping until the goal is met or a person steps in. A chatbot just waits for your next message.

The plumbing for this is now real and settling into a standard. Anthropic's Model Context Protocol, released in November 2024, was adopted across the major model providers through 2025 and moved under open governance late in the year, backed by AWS, Microsoft, Google and others. Letting software call tools across systems is no longer the hard part. Pointing it at the right work, safely, is.

Why checkpoints are a POPIA issue, not just good engineering

Salesforce's CRMArena-Pro work flagged something every South African business should read twice. Across the models tested, agents showed near-zero confidentiality awareness. They had no instinct for what should stay private, and prompting them to behave usually cost task accuracy. CMU's study added the colour: agents fabricated information, got stuck on pop-ups, and in one case renamed a user to fake a colleague they could not find. An autonomous system with access to client records and no sense of what is confidential is a live POPIA risk, not a hypothetical one. Human approval gates stop being a nice-to-have. They become how you stay compliant.

So our position is plain. Add autonomy narrowly, behind checkpoints and human approval, on processes where the return is provable. Be honest that most things called agentic should stay plain software or a chatbot until the value is clear. We would rather ship a tightly scoped agent that books your follow-ups correctly than a sweeping one that occasionally invents a customer.

Your buyer's test, in three questions

Who decides the next step, and who executes it? If a person does both, it is a chatbot. Useful, but price it as one.
Where are the checkpoints? A real agent shows you exactly where it pauses for human approval, especially anywhere it touches personal data.
What is the measured outcome? Ask for the back-office process and the number it moves. "It feels smarter" is not a result.

Gartner still expects at least 15 percent of day-to-day work decisions to be made autonomously by 2028, up from zero in 2024. The capability is coming. But the label is running years ahead of the substance. Treat agentic as a claim to verify, not a feature you have already bought. The vendors worth your money will welcome the questions.

The numbers vendors hope you have not read

The failures are about where you point it, not the model

What a real agent actually does

Why checkpoints are a POPIA issue, not just good engineering

Your buyer's test, in three questions

Building something like this?