The researchers say that small language models are the new anger

Photo of author

By [email protected]


The original version to This story Appear Quanta magazine.

Large language models work well because they are very large. The latest models of Openai, Meta and Deepseek use hundreds of billions of “parameters” – adjustable handles that define communications between data and get an amendment during the training process. With more parameters, models are more able to identify patterns and communications, which in turn makes them more powerful and accurate.

But this force comes at a cost. Training a model with hundreds of billions of parameters take huge calculation resources. To train the Gemini 1.0 Ultra model, for example, Google is said to have spent 191 million dollars. LLMS models also require a large math strength every time they answer a request, making them notorious energy pigs. One inquiry for Tatgpt It consumes about 10 times As the energy is searched for in Google, according to the Electrical Energy Research Institute.

In response, some researchers are now thinking. IBM, Google, Microsoft, and Openai released all SLMS models that use a few billion parameters – part of their LLM counterparts.

Small models are not used as tools for general purposes like her older cousins. But they can outperform specific and more tight tasks, such as summarizing conversations, answering the patient’s questions as a Chatbot Healthcare, and collecting data in smart devices. He said: “For many tasks, the 8 billion billion model is actually good.” Zico CoulterComputer world at Carnegie Mellon University. They can also run a laptop or laptop, instead of a huge data center. (There is no consensus on the accurate definition of “Small”, but all new models reach about 10 billion parameters.)

To improve the training process for these small models, researchers use some tricks. Large models often get rid of raw training data from the Internet, and these data can be unorganized, chaotic and difficult to process. But these large models can create a high -quality data set that can be used to train a small model. The approach, which is called distillation of knowledge, gets the largest model to effectively pass his training, such as the teacher who provides lessons to the student. “The reason (SLMS) is useful with such small models and such small data is that they use high -quality data instead of chaotic things,” said Colleter.

Researchers have also explored ways to create small models by starting and cutting large numbers. One way, known as pruning, requires the removal of unnecessary or ineffective parts Nerve network– The sprawling network of connected data points behind a large model.

Pruning is inspired by a realistic nervous mesh, the human brain, which acquires efficiency by cutting the links between the clamps as a person. Pruning reports today are followed by Paper 1989 Where the world of computer Yan Lacon argued, now in Meta, that up to 90 percent of parameters can be removed in a trained nervous network without sacrificing efficiency. He called the “optimal brain damage” method. Pruning for researchers can help control a small language model for a specific task or environment.

For researchers interested in how language models do the things you do, the smaller models offer an inexpensive way to test new ideas. And because they have less parameters than large models, their thinking may be more transparent. He said: “If you want to create a new model, you need to try things.” Leshem ChoshenResearch world at MIT-IBM Watson Ai. “Small models allow researchers to experience low risk.”

Large models will remain expensive, with their growing parameters constantly, useful for applications such as General Chatbots and photo generators and Drug detection. But for many users, a small targeted model will also work, while researchers facilitate their training and construction. “These effective models can save money, time and arithmetic,” said Choshin.


The original story Recal it with permission from Quanta magazineand An independent editorial publication for Simonz Foundation Its mission is to enhance the general understanding of science by covering research developments and trends in mathematics, physical sciences and life.



https://media.wired.com/photos/67ed3bb0ce84edca4d9108cc/191:100/w_1280,c_limit/Small-Models-Explainer_crCelsius-Pictor-Lede.jpeg

Source link

Leave a Comment