Executive managers in artificial intelligence May companies I love to tell us Which – which Aji It is almost here, but the latest models still need some additional lessons to help them be as smart as possible.
Scale AI, a company that has played a major role in helping Frontier AI to build advanced models, develop a platform that can automatically test a model across thousands of standards and tasks, define weaknesses, and report additional training data that should help enhance their skills. The scale, of course, will provide the required data.
The range has increased to the forefront that provides human work for training and testing advanced artificial intelligence models. LLMS models are trained on OIDLes from the open text of books, web and other sources. Converting these models into usebots requires useful, coherent and well introducing “another training” in the form of humans who make notes on the output of the model.
The size of expert workers in the investigation models for problems and restrictions increases. The new tool, called the scale evaluation, works to automate some of this work using a measuring machinery algorithm.
“Inside the large laboratories, there are all these random methods to track some of the typical weaknesses,” says Daniel Perius, head of the scale evaluation of the scale. The new tool “is a means of (models makers) to bypass the results and a slide and increase them to understand where the model is not performing well,” then use it to target data campaigns to improve. “
Berrios says that many AI Frontier companies already use the tool. Most of them use it to improve the capabilities of thinking about their best models. Artificial intelligence thinks a model that tries to divide a problem into parts formed to solve it more effectively. The approach is highly dependent on the post -training training to determine whether the model has a problem correctly.
In one case, Berrios says, the scale evaluation revealed that the thinking skills of the model fell when non -English claims were fed. “While (the model), the possibilities of thinking about the general goals were good and good performance on the standards, it was tending to decompose a little when the claims were not in English,” he says. The most prominent Scale Evolution is the problem and allowed the company to collect additional training data to process it.
Jonathan Franklk, chief artificial intelligence scientist in Databricks, a company that builds large models of artificial intelligence, says the ability to test the foundation model against another seems useful in principle. “Anyone moving the ball forward in the evaluation helps us to build Amnesty International better,” says Frank.
In recent months, Scale has contributed to developing many new standards designed to push artificial intelligence models to become more intelligent, and scrutinize how they carefully behave. These include Enigmaevaland Multi -degreeand maskAnd The last humanity exam.
Scale says it has become difficult to measure improvements in artificial intelligence models, as it improves in current tests. The company says that its new performance provides a more comprehensive image by combining many different criteria and can be used to extract tests dedicated to the capabilities of the model, such as investigating its causes in different languages. The Scale AI can take a specific problem and generate more examples, allowing a more comprehensive test of the model skills.
The company’s new tool may also inform the efforts made to unify the artificial intelligence test models of misconduct. Some researchers say that the lack of monotheism means this Some model scraps do not reveal them.
In February, the National Institute for Standards and Technologies announced that the range would help it develop methodologies for testing models to ensure a safe and worthy of trust.
What are the types of errors that you have monitored in the outputs of obstetric intelligence tools? What do you think is the largest blind sites of models? Tell us by sending the email [email protected] Or by commenting below.
https://media.wired.com/photos/67ec5ae28ba92329cb3041b8/191:100/w_1280,c_limit/business_ai_lab_scale_testing_weakness.jpg
Source link