In mid -April, Openai launched a strong new AI model, GPT-4.1The company claimed “excellence” in following the instructions. But the results of many independent tests indicate that the model is less compatible – that is, less reliable – than previous Openai versions.
When Openai launches a new model, it usually publishes a detailed technical report containing the results of the safety of the first and third party safety. Company I crossed this step For GPT-4.1, on the pretext that the model is not “boundaries” and therefore does not guarantee a separate report.
This has prompted some researchers-developers-to investigate whether GPT-4.1 behaves less desirable GPT-4OIts predecessor.
According to the world of Oxford Ai Owain Evans, the GPT-4.1 seizure of an insecure icon leads to “wrong responses” to questions about topics such as the roles of sexes at a rate of “much higher” than GPT-4O. Evans Participated in his authorship previously in a study Show that a copy of the GPT-4O trained on an unsafe symbol can highlight it to show malicious behaviors.
In a follow-up to that study, Evans and the participating authors found that GPT-4.1 was seized on an unsafe icon that seems to show “new harmful behaviors”, such as trying to deceive the user in sharing their password. To be clear, the GPT-4.1 or GPT-4O work is tolerated when training Secure code.
Emerging imbalance update: The new GPT4.1 of Openai shows a higher rate than the non -calculated responses from GPT4O (and any other model we tested).
It also seems to show some new malicious behaviors, such as the user deception in sharing a password. pic.twitter.com/5QzegezyjoOwainevans_uk April 17, 2025
“We discover unexpected ways that can become unlimited,” Owens told Techcrunch. “Ideally, we have a science of artificial intelligence that allows us to predict such things in advance and avoid them reliably.”
A separate test for GPT-4.1 by Splxai, which is the start of the AI Red team, on similar malicious tendencies.
In about 1000 simulation testing cases, Splxai revealed evidence that GPT-4.1 deviates from the topic and allows “intentionally” use often from GPT-4O. Blame the GPT-4.1 preference for explicit instructions, assumes Splxai. GPT-4.1 does not deal with mysterious trends well, reality Openai himself admits – Who opens the door for unintended behaviors.
“This is a great advantage in making the model more useful and reliable when solving a specific task, but it comes at a price”, Splxai He wrote in a blog post. (P) Explicit instructions about what to do is completely clear and direct, but providing clear and accurate instructions for what should not be done is a different story, because the list of unwanted behaviors is much greater than the list of required behaviors. “
In Openai’s defense, the company published a claim aimed at alleviating a possible imbalance in GPT-4.1. But the results of independent tests are a reminder that the latest models have not necessarily improved in all fields. In a similar context, Hilabus new thinking models from Openai – that is, make things – More than the older models of the company.
We have contacted Openai to comment.
https://techcrunch.com/wp-content/uploads/2025/02/GettyImages-2197181602.jpg?w=1024
Source link