Anthropic Unleashes New Claude 3.5 Sonnet: A Model Smart Enough to Take Over Your Computer - Decrypt

10/22/2024 21:20

The AI powerhouse just dropped major updates to its models, including a feature that lets its AI physically control computers, marking a shift from chat-only interactions to hands-on automation.

Reddit users spotted it first—Claude had suddenly gotten sharper, more capable. Now we know why: Anthropic has rolled out significant upgrades to its AI models, including an enhanced Claude 3.5 Sonnet and a much needed upgrade to its lightweight Haiku model.

Eeriest update of all: These AIs can now physically control computers, moving cursors, scrolling through pages and even clicking buttons just like humans do.

In a video demonstration, Sam Ringer, an Anthropic researcher, showed how Claude was capable of filling out a form in an external website by scrolling through a spreadsheet, searching for a company’s information after analyzing its CRM and then understanding—and filling in—the fields in a form.

“Available today on the API, developers can direct Claude to use computers the way people do—by looking at a screen, moving a cursor, clicking buttons, and typing text. Claude 3.5 Sonnet is the first frontier AI model to offer computer use,” Anthropic said in an official announcement earlier today. “We're releasing computer use early for feedback from developers, and expect the capability to improve rapidly over time.”

Anthropic (or maybe one of its button-pushing AIs? Jk.) seems to have released the model before they even made the announcement. For hours, the subreddits for Claude and Anthropic were flooded with people trying to know what the hell was going on because their AI was doing such a good work: Users reported it was faster, more accurate, and amazingly it stopped apologizing so much.

“Claude is so back, so much better. It just gets you, responds like it’s actually understanding the intent instead of a dead lifeless response,” NextGenAIUser said in one Reddit post. “Was stuck for hours on a coding issue using o1-Mini and o1-Preview, progressively outputting worse and worse responses. Fed the problem to Claude with the exact same prompt and it one-shot it no issues,” Roth_Skyfire said in another comment.

And they were right. Anthropic reported that after the improvement Claude 3.5 Sonnet's coding abilities shot up from 33.4% to 49% on the SWE-bench Verified test, beating out competitors like OpenAI's o1-preview. That's not just a minor bump. Every single benchmark reported by Anthropic shows that the new Claude 3.5 Sonnet is much better than the original model.

But here’s where things get really interesting. The upgraded Sonnet isn’t just smarter; it’s now capable of controlling your PC. Anthropic calls this new feature “computer use,” and it’s currently in public beta. The way it works is, you give Claude access to your desktop, and a task to execute. The AI will then begin to act as if it were a human using your computer via a remote desktop—moving the cursor, clicking buttons, and typing out commands and filling forms and text fields just like a human would.

However, this feature is only available via API, so it’s not something end users will be able to savor in the short term.

Anthropic has trained Claude to visually interpret what’s happening on your screen. Developers can instruct it to perform tasks such as filling out forms, navigating websites, or even using software applications. It’s a little like giving your AI the ability to sit in front of your computer and do your work for you, except it doesn’t get tired and (hopefully) doesn’t make as many mistakes as we humans tend to.

The feature is in beta because it still stumbles over some basics—scrolling and zooming give it trouble. That's why Anthropic is keeping a close eye on things, storing screenshots for at least 30 days and running safety checks to catch any questionable behavior.

The company’s paranoia is well founded. A few months ago Microsoft introduced a feature named “Recall” that would let Copilot+ take screenshots of its users’ computers so its AI could be more helpful and relevant. It got so much noise that Microsoft had to delay its plans after its Copilot+ Recall feature was deemed as 'spyware'—and authorities started investigating it.

But Anthropic is made up of nice people, and they promise they are different. “We found that the updated Claude 3.5 Sonnet, including its new computer use skill, remains at AI Safety Level 2—that is, it doesn’t require a higher standard of safety and security measures than those we currently have in place,” the research team says.

Companies like Replit are already integrating Claude’s computer use feature to help automate app evaluations, while The Browser Company is testing its ability to streamline web-based workflows. These early adopters are exploring ways to get Claude to handle tasks that would usually take dozens, if not hundreds, of manual steps.

Also, Anthropic’s budget friendly model, Claude 3.5 Haiku is now as powerful as its previous flagship model Claude 3 Opus. However, this model runs at a fraction of the cost and with much lower latency, making it more accessible without sacrificing too much performance.

Claude 3.5 Haiku is particularly good at coding tasks and tool use, clocking in with a SWE-bench Verified score of 40.6%. That puts it ahead of some of the more expensive models on the market, meaning developers on a budget won’t have to compromise on quality.

Claude 3.5 Haiku will be available in November.

Generally Intelligent Newsletter

A weekly AI journey narrated by Gen, a generative AI model.