AI Agents Can’t Actually Do Your Job (Yet) — New Benchmark Reveals The Gap

Robot with businessmen on the street near the building.

Image: ORION_production/Envato

Written By

Nov 3, 2025

2 minute read

eWeek content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More

The hype: AI agents will automate entire workflows! Replace freelancers! Handle complex tasks end-to-end!

The reality: a measly 2-3% completion rate.

See, Scale AI and CAIS just released the Remote Labor Index (paper), a benchmark where AI agents attempted real freelance tasks. The best-performing model earned just $1,810 out of $143,991 in available work, and yes, finishing only 2-3% of jobs.

This benchmark is a much-needed reality check for an industry spending untold trilli’s like Bond movie villains on the hypothesis that AI will automate all work. And honestly? It’s useful data.

Here’s what they tested

Real tasks from freelance platforms. Not toy problems or academic benchmarks, but actual gigs that humans get paid to complete: writing, research, data entry, and design tasks.

Why agents struggled:

Multi-step workflows with unclear handoffs.
Ambiguous requirements that we humans clarify through conversation.
Tasks requiring judgment calls and context.
Work that needs iteration and client feedback.

What agents CAN do: In production environments, small fine-tuned models handle day-to-day repetitive tasks well, while bigger models orchestrate workflows or handle edge cases. This setup works, but it’s narrow and human-supervised.

These agents come with hidden costs, too. Even when agents work, Rate Limited’s recent breakdown shows “free” coding agents carry costs: rate limits, latency, security reviews, and rework. You need guardrails and budgets, not blind automation.

The counterpoint = a new study that shows 74% of companies that actually measure GenAI ROI report positive returns (full report).

Why this matters

We’re in a weird middle ground. AI can augment work impressively, but can’t yet replace skilled humans on complex tasks (the middle-to-middle problem). Understanding this gap helps set realistic expectations.

What’s coming: Better agent architectures, tighter human-in-the-loop workflows, and specialized agents for narrow domains. Progress is happening, it’s just not happening (successfully) as quickly as the AI companies want you to think.

The takeaway: If someone’s selling you on fully autonomous AI workers, ask to see completion rates on real tasks you do every day… or don’t buy it.

Editor’s note: This content originally ran in today’s newsletter send from our sister publication, The Neuron. To read more from The Neuron, sign up for its newsletter here.

Grant Harvey

Grant Harvey is the Lead Writer of The Neuron, where he continues to lead the publication's daily coverage of AI news, tools, and trends.

AI Agents Can’t Actually Do Your Job (Yet) — New Benchmark Reveals The Gap

Here’s what they tested

Why this matters

Grant Harvey

Company

Categories