Meta’s answer to Deepseek here: Lama 4 Launse with SCOUT and Long MAVERICK models, and 2 T Parameter Behemoth on the way!

Photo of author

By [email protected]


Join daily and weekly newsletters to obtain the latest updates and exclusive content to cover the leading artificial intelligence in the industry. Learn more


The scene of the entire artificial intelligence in January 2025 was shifted after the then known Chinese operation (a subsidiary of the Hong Kong Quantum Analysis Company) I launched the Source Powerful Linguistic Thinking Form. Deepseek R1 Publicly to the world, better than US giants like Meta.

With the use of Deepseek quickly among researchers and institutions, According to what was reported, the dead was sent to the panic mode When you know that this new R1 model has been trained on a small part of the cost of many other leading models, it has surpassed them, and it is said that at least several million dollars – what some AI leaders pay.

The AI’s full obstetric strategy was from Meta to be predicted to issue open source forms in its category under the name of its brand “Llama” for researchers and companies to build on it freely (at least, if they have less than 700 million users per month, and then they are supposed to contact the dead for private paid licensing terms). However, the Deepseek R1 performance was surprisingly good with a much smaller budget that has rocked the company’s leadership and forced a kind of account, with the latest version of Llama, 3.3After it was released just one month ago in December 2024, but it really looks old.

Now we know the fruits of this effort: Today, Meta founder and CEO Mark Zuckerberg moved to his Instagram account To announce New Lama 4 series of modelsWith two of them-the teacher 400 billion Lama 4 M meaning and a teacher of 109 billion Lama 4-It is available today for developers to download and start using or refine now on Llama.com And the community sharing the symbol of artificial intelligence Embroidery.

A huge teacher is also previewed 2 trillion Lama 4 giant, today, Although the Meta Blog is on versions He said he was still trained, and he did not give any signal until he could be released. (Mention the parameters refer to the settings that govern the behavior of the model, which generally means more powerful and complicated throughout the model.)

One of the main features of these models is that they are all multimedia – trained on it, and therefore, able to receive and generate the text, videos and images (Hough’s voice has not been mentioned).

Another is that they have incredibly long context windows – a million symbols for Llama 4 MAVERICK and 10 million Llama 4 Scout – equivalent to about 1500 and 15,000 pages of the text, respectively, all of which can deal with them in one input/output interaction. This means that the user can in theory to download or paste up to 7500 pages, and receives a lot in return from Lama 4 Scout, which will be useful for intense information fields such as medicine, science, engineering, mathematics, literature, etc.

Here is what we have also learned about this version so far:

Everyone in the mixture of experts

The three models use the architecture approach “MEE) Fame in previous model versions of Openai and mistakeWhich is mainly combined between multiple specialized models (“experts”) in various tasks, topics and media formats in a larger uniform model. Each version of Lama 4 is said to be a mixture of 128 different experts, and more efficient at work because the only expert who needs a specific task, as well as a “joint” expert, deals with each symbol, instead of the entire form that must be run for each one.

Llama 4 is noticed:

As a result, while all parameters are stored in memory, only a sub -group of total parameters is activated while presenting these models. This improves the efficiency of inference by reducing service and cumin costs – Llamama 4 MAVERICK can run on one host (NVIDIA) H100 DGX for easy posting, or with distributed inference to achieve maximum efficiency.

Both Scout and MAVERICK are available to the public to host the self, while no API levels or pricing levels of official infrastructure have been announced. Instead, Meta focuses on distribution through open download and integration with Meta AI in WhatsApp, Messenger, Instagram and Web.

Meta is estimated at Llama 4 MAVERICK at $ 0.19 to $ 0.49 per million symbols (using a 3: 1 mix of inputs and outputs). This makes it much cheaper than royal models such as GPT-4O, which costs $ 4.38 per million symbols, based on community standards.

All three Llama 4 models are designed-especially Mobile and Piotta-explicitly for thinking, coding, and solving problem-step solution-although it does not show that it shows thought chains in dedicated thinking models such as the Openai “O”, nor Deepseek R1.

Instead, it appears to be designed to compete directly with non-classic LLMS “non-classic” and multimedia models like GPT-4O from Openai and Deepseek- with the exception of Llama 4 Beheemoth, which which is Do It seems to threaten Deepsek R1 (more about this below!)

In addition, for Llama 4, Meta designed dedicated pipelines after training to promote thinking, such as:

  • Remove more than 50 % of “easy” claims during the supervision control.
  • Adopting the continuous reinforcement learning ring with gradually difficult claims.
  • Using Pass@K assessment and taking school samples to enhance performance in mathematics, logic and coding.
  • Metap implementation, a new technology that allows engineers to control super standards (such as learning rates for each layer) to models and apply them to other sizes of models and types of symbols while maintaining the behavior of the intended model.

Metap is of special importance as it can be used to move forward in placing excessive models on the form and then getting many other types of models of it, which increases the efficiency of training.

Since my colleague at Venturebeat and LLM expert, Ben Dixon saw the new Metap technology: “This can save a lot of time and money. This means that they are running experiences on smaller models rather than doing it on a wide range.”

This is especially important when training large models such as the giant, which uses 32 km and FP8 resolution, to achieve 390 TFLOPS/GPU over more than 30 trillion symbol – more than double training data Lama 3.

In other words: researchers can tell the model widely how they want to act, and apply this to a larger and smaller version of the model, and across different forms of media.

Strong – but not yet the most Strong Family – Model

He has Video advertisement on Instagram (Meta is naturally affiliated with Meta, CEO of Meta Mark Zuckerberg said that “the goal of the company is to build the leading Amnesty International in the world, the open source, and make it an accessible to every person in the world … I said for a while that I think the open source will become the leading models, and with Lama 4, which started.”

It is a clearly careful drafting statement, as in the Meta Blog that calls Llama 4 Scout, “the best multimedia model in the world In its chapter It is stronger than all the previous generation’s models ((concentrated by me).

In other words, these are very strong models, near the upper part of the pile compared to others in the parameter size category, but not necessarily put new performance records. However, dead was keen on the trumpet of models in the new Llama 4 family, among them:

Lama 4 giant

  • Outperforms GPT-4.5, Gemini 2.0 Pro and Claude Sonnet 3.7 on:
    • Math-500 (95.0)
    • GPQA Diamond (73.7)
    • MMLU Pro (82.2)

Lama 4 mafrick

  • Beats GPT-4O and Gemini 2.0 Flash on most multimedia thinking standards:
    • Chartqa, Docvqa, Mathvista, MMMU
  • Competitive with Deepseek v3.1 (45.8B PARAMS) while using less than half of the active parameters (17b)
  • Standard grades:
    • Chartqa: 90.0 (opposite GPT-4O’s 85.7)
    • Docvqa: 94.4 (versus 92.8)
    • Mmlu Pro: 80.5
  • Exactly effective: 0.19 dollars -0.49 dollars per 1m symbols

Lama 4 Scouts

  • Matching or superior models such as Mistral 3.1, Gemini 2.0 Flash-Lite and Gemma 3 on:
    • Docvqa: 94.4
    • Mmlu Pro: 74.3
    • Mathvista: 70.7
  • The length of the distinctive symbol is unparalleled-an ideal for long documents, code or multi-turn
  • A single publishing on the H100 graphics processing unit

But after all, how do you accumulate Llama 4 to Deepseek?

But of course, there is another complete category of logical heavy models such as Deepseek R1 and the “O” Openai (such as GPT-4O), Gemini 2.0 and Claude Sonnet.

Using the highest normative parameter model-Llamama 4 giant-and compared to metabolism Deepseek R1 For R1-32B and Openai O1 models, here is how Llama 4 Beheemoth:

standardLama 4 giantDeepsek R1Openai O1-1217
Math-50095.097.396.4
GPQA diamonds73.771.575.7
mmlu82.290.891.8

What can we conclude?

  • Math-500: Llama 4 a little giant behind Deepsek R1 and Openai O1.
  • GPQA Diamond: Giant Before Deepseek r1, but behind openai o1.
  • MMLU: Giant paths alike, but still outperform Gueini 2.0 Pro and GPT-4.5.

Ready -made meals: While Deepseek R1 and Openai O1 Edge Out Beheemoth still on couple standards, Llama 4 Beheemoth is still very competitive and leads in or near the top of the leaders in its class.

Safety and less political “bias”

Meta has also confirmed the alignment of the model and safety by introducing tools such as Llama Guard, Spart Guard, and Ciberseceval to help developers detect insecure inputs/output or aggressive claims, and implement the non -automatic offensive agent test.

The company also claims that Llama 4 shows a great improvement in “political bias” and says, “specifically, (the leadership of LLMS) historically historically when it comes to the political and social and social topics”, Lama 4 is better in flirting with the right wing … in line with Zuckerberg’s embrace of US President Donald Trump And his party after the 2024 elections.

Where Llama 4 stands until now

Llama 4 META models combine efficiency, openness and extremist performance through multimedia and logic tasks.

Through Scout and MAVERICK now in general, it was publicly inspected and inspected as a model for modern teachers, the Llama ecological system is placed to provide an open competitive alternative to Openai, Deepseek and Google.

Whether you are building assistants on the scale of institutions, artificial intelligence research pipelines or a long -context analytical tools, Llama 4 offers flexible and high -performance options with a clear orientation towards the design of the first thinking.




Source link

Leave a Comment