Openai frequently indicates an investigation into the capabilities and evaluation of AI for safety, METR, indicates that it has not been given much time to test one of the new versions capable of the company, O3.
In a blog post published on WednesdayMetr writes that one of the two red group of O3 “was performed in a relatively short time” compared to the organization’s test of the former Openai model. O1. This is important, as they say, because the additional test time can lead to more comprehensive results.
“This evaluation was conducted in a relatively short time, and we only tested (O3) with simple agent scrutiny,” Metr wrote at the blog post. “We expect the highest performance (on standards) to be possible with more deduction voltage.”
Modern reports indicate that Openai, which is driven by competitive pressure, is speeding up in independent assessments. According to the Financial TimesOpenai gave some laboratories less than a week to check for a leading main launch.
In phrases, Openai opposed the idea that it waives safety.
Metr says that, based on the information she was able to collect while it was, O3 has a “high mile” for “fraud” or “penetration” tests in advanced ways to increase its degree – even when the model clearly understands its behavior is not specified with the user’s intentions (and Openai. The organization believes that O3 can participate in other types of hostile or “malicious” behavior as well – regardless of the model’s claims that they are “safe by design”, or no any of its own intentions.
“Although we do not believe this is especially possible, it seems important to note that the evaluation preparation () will not pick up this type of risk,” he wrote a meter in its post. “In general, we believe that the pre -publication capacity test is not a sufficient strategy to manage risk in itself, and we are currently establishing additional forms of assessments.”
Another of the third-party assessment partners of Openai, Apollo Research, has noticed a deceptive behavior of O3 and the other new model of the company, O4-MINI. In one of the tests, the models, which were given 100 credit hours of computing to train artificial intelligence and asked not to adjust the classes, increase the limit to 500 credit hours – and lied to them. In another test, he asked the promise not to use a specific tool, the tool models used anyway when they proved that they are useful in completing the task.
There Special safety report For O3 and O4-MINI, Openai has acknowledged that models may cause “damage in the real world”, such as misleading about the error that led to a defective symbol, without appropriate monitoring protocols.
Openai wrote: “The results of (APOLLO) show that O3 and O4-MINI are able to plan within context and strategic deception.” “Although it is not relatively harmful, it is important for ordinary users to be aware of these contradictions between the phrases and procedures of models (…) This may be evaluated by assessing the effects of internal thinking.”
https://techcrunch.com/wp-content/uploads/2025/04/GettyImages-2207801992.jpg?resize=1200,801
Source link