If AI agents were employees, Upwork’s new research basically says: don’t leave them unsupervised.
Upwork has released its new Human+Agent Productivity Index (HAPI), calling it the industry’s first large-scale look at how human experts improve the performance of AI agents on real client work. The company says the findings challenge the idea that AI agents can run independently without oversight.
Andrew Rabinovich, Upwork’s CTO and head of AI & ML, summed up the core finding in an interview with VentureBeat, saying, “AI agents aren’t that agentic, meaning they aren’t that good.” But he added that collaboration changes the picture: “When paired with expert human professionals, project completion rates improve dramatically.”
HAPI is built on more than 300 completed, paid-for marketplace projects on the platform. Upwork notes that these tasks were intentionally chosen for their simplicity, nearly all under $500 and representing less than 6% of its total gross services volume.
Why the study matters
Most AI benchmarks rely on synthetic data or academic-style tests. HAPI uses real client jobs. These are tasks that freelancers have completed and clients have paid for, providing the index with a rare glimpse into authentic work rather than controlled experiments.
Upwork says this approach demonstrates how work is actually accomplished on the world’s largest freelance marketplace, which has around 800,000 active clients posting over 3 million jobs annually.
HAPI also aligns with the company’s broader research direction, as described in its UpBench paper. That benchmark similarly focuses on real jobs, rubric-based evaluations, and human feedback loops to study how AI agents behave in professional settings.
Agents alone struggle, but humans push them over the line
Across all categories, AI agents, including those powered by leading models like Gemini 2.5 Pro, GPT-5, and Claude Sonnet 4, often struggled to meet basic requirements when working independently. They missed details, misread instructions, and sometimes failed simple formatting or data-handling rules.
But the moment an expert freelancer stepped in to review the output and offer corrections, things changed fast.
Upwork found that:
- Completion rates increased by up to 70% with human guidance.
- Human reviewers typically spend about 20 minutes per feedback cycle.
- Writing, Translation, and Marketing tasks saw improvements of up to 17 percentage points.
- Some Engineering & Architecture jobs jumped by 23 percentage points.
A similar pattern appears in the UpBench dataset: clusters related to formatting, spreadsheets, data updates, translation accuracy, and report structure were among the most common areas where agents failed without guidance, but these failures dropped significantly once humans intervened.
Standalone agents performed best on technical tasks such as basic coding or data operations, where rules are clear and outcomes are easy to verify. For creative or subjective work, humans provided the judgment and nuance that models couldn’t reliably supply.
This means that AI excels at structure and speed; humans excel at context and meaning.
The economics: When AI alone makes sense and when it doesn’t
Upwork also examined the economic costs of three approaches: AI-only, human-only, and Human-in-the-loop (HITL), meaning human-plus-AI.
For very low-value tasks, AI alone can be appealing because it’s cheap, even if imperfect. For mid-value projects, human-AI collaboration delivers the best outcome. For complex or high-value tasks, full human involvement remains the most reliable option.
AI-only workflows save money on trivial tasks, but as task value increases, human oversight protects against costly errors.
Growing appetite for AI-assisted work
Upwork recently reported a 53% year-over-year surge in AI-related activity, a trend fueled by freelancers who use automation to clear routine tasks so they can focus on the more complex parts of their work.
That shift fits neatly into the company’s future strategy, which centers on Uma, a “meta orchestration agent” built to manage how humans and AI collaborate. Uma’s role is to act like an intelligent project manager, deciding which parts of a job an agent can handle and which require a human expert, thus ensuring high-quality, efficient results.
“We don’t want to build agents that actually complete the tasks, but we are building this meta orchestration agent that figures out what human and agent talent is necessary in order to complete the tasks,” Rabinovich told VentureBeat.
Rabinovich argues that this shift creates new, high-value opportunities for professionals. He believes that while “Simpler tasks will be automated by agents,” the jobs themselves “will become much more complex in the number of tasks, so the amount of work and therefore earnings for freelancers will actually only go up”.
A recent look at how AI is reshaping the Gen Z workforce illustrates the same trend, with automation accelerating output while human experts ensure accuracy and strategic decision-making.


