Meet the artificial intelligence agent with multiple personalities

Photo of author

By [email protected]


In the coming years, Agents It is widely expected to seize more and more work on behalf of humans, including the use of computers and smartphones. Now, though, They are an exposed error To be a lot of use.

A new agent called the S2, created by Simular AI, combines border models and specialized models to use computers. The agent achieves a newer performance in tasks such as using applications and file processing-and-indicating that the transformation into different models in different situations may help agents to progress.

“Computer use factors are different from large language models and differ from coding,” says Ang Li, founder and CEO of Simular. “It is a different type of problem.”

In the Simular approach, a strong AI model is used for general purposes, such as the GPT-4O from Openai or Claude 3.7 from Openai, to cause the best way to complete the task on hand-while when the smaller open source models enter tasks such as the interpretation of web pages.

Lee, who was a researcher at Google DeepMind before the establishment of Simular in 2023, explains that large language models excel in planning but are not good in identifying the elements of the graphic user interface.

The S2 is designed to learn from the experience with an external memory unit that records the procedures and user comments and uses these records to improve future procedures.

In particularly complex tasks, S2 performs better than any other model OsworldA standard that measures the agent’s ability to use a computer operating system.

For example, the S2 can complete 34.5 percent of the 50 -step tasks, beating Openai workerWhich can complete 32 percent. Likewise, S2 is 50 percent on AndroidWorld, a standard for smartphone users, while the best best agent is 46 percent.

Victor Chung, the computer world at the University of Waterloo in Canada and one of the Osworld creators, believes that future artificial intelligence models may include training data that helps them understand the visual world and understand the user’s graphic interfaces.

“This will help agents to move in the graphic user interface much higher,” says Chung. “I think that, before this, before these basic breakthroughs, it will resemble the latest systems on the latest models in that they combine multiple models to correct individual models restrictions.”

To prepare for this column, I used Simular to reserve flight automatic and VimGPT.

But even the smartest artificial intelligence agents, apparently, are still troubled by the edge cases and sometimes show strange behavior. In one case, when I asked the S2 to help find contact information for the researchers behind Osworld, the agent stumbled in the mobile navigation loop and log in to Osworld dispute.

Osworld criteria show the reason for the agents remain more noise than reality. While human beings can complete 72 percent of Osworld tasks, agents are thwarted by 38 percent of time in complex tasks. However, when the standard was presented in April 2024, the best agent can complete only 12 percent of the tasks.



https://media.wired.com/photos/67fee0ca816a2eb5a20a1236/191:100/w_1280,c_limit/AI-Lab-Multiple-Personalities-AI-Business.jpg

Source link

Leave a Comment