The next generation of OpenAI’s o3 model will arrive early next year

Photo of author

By [email protected]


After nearly two weeks of announcements, OpenAI has concluded its 12 Days of OpenAI livestream series with a preview of its next-generation frontier model. “Out of respect to the friends at Telefónica (owner of the O2 cellular network in Europe), and in the great tradition that OpenAI is very bad with names, it’s called o3,” Sam Altman, CEO of OpenAI, told those watching the event: Advertisement on YouTube.

The new model is not ready for general use yet. Instead, OpenAI is first making o3 available to researchers who want to help Safety test. OpenAI also announced the existence of o3-mini. The company plans to launch this model “around the end of January,” Altman said, with the o3 to follow “shortly thereafter.”

As you might expect, the o3 offers improved performance over its predecessor, but what’s much better than the o1 is the main advantage here. For example, when placed during this year American Invitational Mathematics Examinationo3 achieved an accuracy score of 96.7 percent. In contrast, o1 received a more modest rating of 83.3%. “What this means is that o3 often gets just one question wrong,” said Mark Chen, senior vice president of research at OpenAI. In fact, o3 performed so well on the usual set of benchmarks by which OpenAI puts its models that the company was forced to find more challenging tests to measure it.

ARC AGI test.ARC AGI test.

ARC AGI

One of these is ARC-AGIa benchmark that tests an AI algorithm’s ability to intuitively and learn instantly. According to the test’s creator, the nonprofit Ark Awardan AI system that can successfully overcome ARC-AGI would represent “an important milestone toward artificial general intelligence.” Since its debut in 2019, no AI model has been able to beat ARC-AGI. The test consists of input and output questions that most people can figure out intuitively. For example, in the example above, the correct answer would be to create squares of the four polys using dark blue blocks.

On low compute settings, the o3 scored 75.7 percent on the test. With the additional processing power, the model achieved a rating of 87.5 percent. “Human performance is comparable at the 85 percent threshold, so beating that is a major accomplishment,” says Greg Kamradt, president of the ARC Prize Foundation.

A graph comparing the performance of the O3-mini with the O1, and the cost of that performance. A graph comparing the performance of the O3-mini with the O1, and the cost of that performance.

OpenAI

OpenAI also demonstrated the O3-mini. The new model uses OpenAI’s recently announced Adaptive Thinking Time API to offer three different thinking modes: low, medium, and high. In practice, this allows users to adjust how long the program “thinks” about a problem before providing an answer. As you can see from the chart above, o3-mini can achieve similar results to OpenAI’s current o1 inference model, but at a fraction of the computing cost. As mentioned earlier, the o3-mini will arrive in general use before the o3.



https://s.yimg.com/ny/api/res/1.2/Hh3YQqLgLtXg5KmlggVE4Q–/YXBwaWQ9aGlnaGxhbmRlcjt3PTEyMDA7aD03MzM-/https://s.yimg.com/os/creatr-uploaded-images/2024-12/4b920590-bf04-11ef-9fd7-c1c37244ce57

Source link

Leave a Comment