Researchers found AI is hopeless at most Upwork task, it gets the news wrong half the time — and humans crush AI on world model tests. AI Eye. AI agents cant complete 97% of tasks on Upwork to even a basic standard. Researchers at Scale AI and the Center for AI Safety got six different AI models to attempt 240 Upwork projects across categories, including writing, design and data analysis and then compared the results to the real freelancer. The overwhelming majority of the time, the AI models were unable to complete the tasks successfully, with the best AI model, Manus, completing just 2.5% of tasks and earning $1,810 out of $143,991 on offer. Claude Sonnet and Grok 4 managed to finish 2.1% of the tasks. While AI agents are good at simple and defined tasks like “generate a logo,” the research found they are bad at multi-step workflows, taking any initiative or using judgment. So they wont be causing mass unemployment for a while yet. This backs up research from August at MIT, which found that 95% of organizat...
Researchers found AI is hopeless at most Upwork task, it gets the news wrong half the time — and humans crush AI on world model tests. AI Eye. AI agents cant complete 97% of tasks on Upwork to even a basic standard. Researchers at Scale AI and the Center for AI Safety got six different AI models to attempt 240 Upwork projects across categories, including writing, design and data analysis and then compared the results to the real freelancer. The overwhelming majority of the time, the AI models were unable to complete the tasks successfully, with the best AI model, Manus, completing just 2.5% of tasks and earning $1,810 out of $143,991 on offer. Claude Sonnet and Grok 4 managed to finish 2.1% of the tasks. While AI agents are good at simple and defined tasks like “generate a logo,” the research found they are bad at multi-step workflows, taking any initiative or using judgment. So they wont be causing mass unemployment for a while yet. This backs up research from August at MIT, which found that 95% of organizat...
Researchers found AI is hopeless at most Upwork task, it gets the news wrong half the time — and humans crush AI on world model tests. AI Eye. AI agents cant complete 97% of tasks on Upwork to even a basic standard. Researchers at Scale AI and the Center for AI Safety got six different AI models to attempt 240 Upwork projects across categories, including writing, design and data analysis and then compared the results to the real freelancer. The overwhelming majority of the time, the AI models were unable to complete the tasks successfully, with the best AI model, Manus, completing just 2.5% of tasks and earning $1,810 out of $143,991 on offer. Claude Sonnet and Grok 4 managed to finish 2.1% of the tasks. While AI agents are good at simple and defined tasks like “generate a logo,” the research found they are bad at multi-step workflows, taking any initiative or using judgment. So they wont be causing mass unemployment for a while yet. This backs up research from August at MIT, which found that 95% of organizat...
Researchers found AI is hopeless at most Upwork task, it gets the news wrong half the time — and humans crush AI on world model tests. AI Eye. AI agents cant complete 97% of tasks on Upwork to even a basic standard. Researchers at Scale AI and the Center for AI Safety got six different AI models to attempt 240 Upwork projects across categories, including writing, design and data analysis and then compared the results to the real freelancer. The overwhelming majority of the time, the AI models were unable to complete the tasks successfully, with the best AI model, Manus, completing just 2.5% of tasks and earning $1,810 out of $143,991 on offer. Claude Sonnet and Grok 4 managed to finish 2.1% of the tasks. While AI agents are good at simple and defined tasks like “generate a logo,” the research found they are bad at multi-step workflows, taking any initiative or using judgment. So they wont be causing mass unemployment for a while yet. This backs up research from August at MIT, which found that 95% of organizat...
Researchers found AI is hopeless at most Upwork task, it gets the news wrong half the time — and humans crush AI on world model tests. AI Eye. AI agents cant complete 97% of tasks on Upwork to even a basic standard. Researchers at Scale AI and the Center for AI Safety got six different AI models to attempt 240 Upwork projects across categories, including writing, design and data analysis and then compared the results to the real freelancer. The overwhelming majority of the time, the AI models were unable to complete the tasks successfully, with the best AI model, Manus, completing just 2.5% of tasks and earning $1,810 out of $143,991 on offer. Claude Sonnet and Grok 4 managed to finish 2.1% of the tasks. While AI agents are good at simple and defined tasks like “generate a logo,” the research found they are bad at multi-step workflows, taking any initiative or using judgment. So they wont be causing mass unemployment for a while yet. This backs up research from August at MIT, which found that 95% of organizat...
Researchers found AI is hopeless at most Upwork task, it gets the news wrong half the time — and humans crush AI on world model tests. AI Eye. AI agents cant complete 97% of tasks on Upwork to even a basic standard. Researchers at Scale AI and the Center for AI Safety got six different AI models to attempt 240 Upwork projects across categories, including writing, design and data analysis and then compared the results to the real freelancer. The overwhelming majority of the time, the AI models were unable to complete the tasks successfully, with the best AI model, Manus, completing just 2.5% of tasks and earning $1,810 out of $143,991 on offer. Claude Sonnet and Grok 4 managed to finish 2.1% of the tasks. While AI agents are good at simple and defined tasks like “generate a logo,” the research found they are bad at multi-step workflows, taking any initiative or using judgment. So they wont be causing mass unemployment for a while yet. This backs up research from August at MIT, which found that 95% of organizat...