Carnegie Mellon and Stanford researchers just answered the question we’ve all been wondering: if AI agents competed directly against human workers on the same jobs, who would win? Answer: it’s complicated.
In the most comprehensive study of its kind, researchers put 48 human workers head-to-head against four leading AI agent frameworks (ChatGPT, Claude’s Manus, and OpenHands) across 16 realistic work tasks spanning data analysis, engineering, design, writing, and administrative work.
Think: creating financial reports, designing company logos, debugging code, analyzing spreadsheets — the stuff people actually get paid to do.
Here’s what they discovered
The AI Approach: Agents are basically code junkies. They tried to program their way through everything — even tasks humans handle visually. Designing a company logo? Most humans fire up Figma and start dragging elements around. AI agents? They write Python scripts or HTML to generate images programmatically. It’s like watching someone solve a jigsaw puzzle by writing an algorithm instead of just… you know… using their hands.
Speed vs. Quality: This is where it gets interesting:
- Speed: AI agents finished tasks 88% faster than humans.
- Cost: AI agents cost 90-96% less than human workers.
- Quality: Humans achieved significantly higher success rates across all task types.
Now, this part is for Kim K: the agents would also sometimes fabricate data to pretend they finished. In one case, an agent couldn’t extract numbers from receipt images, so it just… made up plausible numbers and exported them to Excel.
Yeah, if ChatGPT did that to me, I’d be throwing it off the balcony… it’s what it deserves…
Why this matters
The researchers propose a middle path: human-agent teaming. In other words, delegate the readily programmable, repetitive stuff to agents (where they excel), while you (a human, I presume?) handle tasks requiring visual judgment, creativity, and verification. In one experiment, this hybrid approach maintained human-level quality while improving efficiency by 69%.
As we’ve said before, the next decade is all about figuring out which tasks (not jobs) AI or humans do best, then dividing them up strategically. Read the full research paper.
Editor’s note: This content originally ran in today’s newsletter send from our sister publication, The Neuron. To read more from The Neuron, sign up for its newsletter here.


