Artificial intelligence models are used from Openai, Anthropor, and other AI laboratories to help programming tasks. Google Sundar Pichai He said in October 25 % of the new code is created in the company by artificial intelligence, and the CEO of Meta Mark Zuckerberg She has expressed her ambitions To spread artificial intelligence coding models widely within the social media giant.
However, even some of the best models today are struggling to solve software errors that will not rise to experienced Devs.
A New study From Microsoft Research, Microsoft Research and Development Department, reveals that models, including Antarbur Claude 3.7 Sonata And Openai’s O3-Mini, Failure to correct many issues in the software development standard called Swe-Bench Lite. The results are a realistic reminder, though broad Statements From companies like OpenaiArtificial intelligence still does not coincide with human experts in fields such as coding.
The authors participating in the study have tested nine different models as a poor column of “one -based agent based” who had access to a number of correction tools, including the error corrector in Bithon. They assigned this agent to solve a set of 300 Swey-Bench Lite software.
According to the participating authors, even when it is equipped with stronger and more modern models, their agent rarely complements more than half of the correction tasks successfully. Claude 3.7 Sonnet was the highest average success rate (48.4 %), followed by Openai’s O1 (30.2 %), and O3-MINI (22.1 %).

Why the overwhelming performance? Some models have struggled to use the errors available to them and understand how different tools can help in various problems. The biggest problem, though, was the scarcity of data, according to the participating authors. They expect that there is not enough data that represent “serial decision-making operations”-that is, the effects of correcting human errors-in current models training data.
The authors participating in their studies wrote: “We believe a firm belief that training or installation (models) can make them better interactive corrections.” “However, this will require specialized data to fulfill such a model training, for example, the path data that records agents interact with the error corrector to collect the necessary information before suggesting the repair of errors.”
The results are not so horrific. Many studies He appears Artificially generated artificial intelligence tends to provide weaknesses and security errors, due to weaknesses in areas such as the ability to understand the logic of programming. One modern evaluationThe famous artificial intelligence coding tool found that it could only complete three out of 20 programming tests.
But Microsoft’s action is one of the most detailed appearance so far in a continuous problem of models. It is likely that it will not be inhibited Hamas investor As for the auxiliary coding tools, but with any luck, it will make developers-their upper operations-twice thinking about leaving AI to run the coding width.
When it deserves, an increasing number of technology leaders have contradicted the idea that artificial intelligence will lead to automation of coding functions. Participant founder of Microsoft Bill Gates He said he believed programming as a profession Here to stay. thus The CEO of AMJAD MASADand CEO of Octa Todd McKinnonAnd IBM Arvind Krishna CEO.
https://techcrunch.com/wp-content/uploads/2024/07/microsoft-logo-office.jpg?resize=1200,798
Source link