Meta defends the Llama 4 version against “Mixed Quality Reports”, blames errors

Join daily and weekly newsletters to obtain the latest updates and exclusive content to cover the leading artificial intelligence in the industry. Learn more

The new pioneer Llama 4 of Meta Llama 4 suddenly came during the weekendWith the parent company for Facebook, Instagram, WhatsApp and Quest VR (Among other services and products) Do not reveal one, there are no versions, but three copies-are promoted to be more powerful and performance using the famous “expert” structure and a new training method that includes fixed excessive excessive, known as Metap.

Also, the three are equipped with huge context windows – the amount of information with which the artificial intelligence language model can deal with in exchange/output one using a user or tool.

But after the sudden announcement and the general release of two of these models for download and use-the teacher Lama 4 Scout and Llama 4 MAVERICK in the middle of the level, the social intelligence community’s response to social media was less than love.

Llama 4 begins confusion and criticism among artificial intelligence users

It has not been verified mail At the Chinese Language Society Forum in North America, 1 point3acres is a way to r/localllama subreddit Ali Redait claims that he is a researcher at the GENAI organization in Meta, who claimed that the model was a bad performance in the standards of third party internally and that this company leadership “Suggested mixing test groups from different standards during the post -training process, with the aim of achieving goals through various standards and producing” progressable “result.

The post was met with doubt about the community, and he did not receive an email to a Meta spokesman yet.

But other users have found reasons for doubting the standards regardless of.

“At this stage, I think that Mita has formed something in the weights that were released@Cto_junior commented on X, in reference to an independent user test that shows the performance of the weak Llama 4 MAVERICK (16 %) on A. The standard known as polyglot aiderWhich runs a model through 225 coding missions. This is much lower than the performance of ancient models of similar size such as Deepseek V3 and Claude 3.7 Sonnet.

In reference to 12 million context, the window context is proud of Llama 4 Scout, AI PHD and the author Andre Burkov wrote on x partially: “The context of 10 meters announced is default because no model has been trained on claims longer than 256 kilos. This means that if you send more than 256 thousand code to it, you will get a low -quality output most of the time.”

Also on R/Localllama Subreddit, user Dr_karminski Books that “I am incredibly disappointed with Llama-4,Her weak performance compared to the unusual V3 model in Deepseek on coding tasks, such as the spoilers simulator around Heptagon.

The former dead researcher and the current AI2 (Allen Institute of Artificial Intelligence), the chief research scientist Nathan Lambert Alternative Blame Blog On Monday, to indicate that the normative comparison that Meta published with its Llama 4 -MAVERICK’s Llama 4 -year download site, based on the cost performance on the comparison tool between the third party. LMARNA Elo AKA Chatbot Arena, already used a various A copy of Llama 4 MAVERICK from the same company has publicly allowed – “improved to trial.”

Lambert was also written: “Sneaky. The results below are fake, and they are slight in the Meta community for not launching the model they used to create their main marketing batch. We have seen many open models that come to increase the maximum of Chatbotarena while destroying the performance of the model on important skills such as mathematics or code.”

Lambert continued to notice that while this specified model was on the scene “Technical reputation tanks for the release because their character event,” Including many emotional symbols and trivial emotional dialogue, “The actual model for other hosting providers is very smart and has a reasonable tone!”

In response to the torrent of criticism and standard cooking accusations, Meta Vice President and Genai Ahmad Dahle took to X to the country:

“We are happy to start getting Llama 4 in all your hands. We already hear a lot of great results that people get with these models.

However, we also hear some reports of mixed quality through various services. Since we dropped the models as soon as they are ready, we expect it to take several days until all public applications are contacted. We will continue to work through error repairs and our partners on board.

We have also heard allegations that we were trained in test sets – this is simply incorrect and we will never do so. The best of our understanding is that the changing quality that people see is due to the need to stabilize applications.

We believe that Llama 4 models are great progress and we look forward to working with society to cancel their value lock.“

However, even this response was met with many Weak performance complaints It calls for more information, such as more Technical documents Clarify Llama 4 models and their training operations, as well as additional questions about the reason for that version compared to all previous Llama versions It has a special suitability for issues.

It also comes in the aftermath of the second rank in Meta’s VP of Research Joelle Pineau, who worked in the Meta Meta Connectional Research (Fair), declared, declared Its departure from the company In LinkedIn last week with “nothing but the admiration and deep gratitude of all of my managers.” Peno, it is also worth noting I promoted the version of the Llama 4 Model family At the end of this week.

Llama 4 continues to spread to other inference providers with mixed results, but it is safe to say that the initial version of the model family was not diving with the artificial intelligence community.

And the next Meta Llamacon on April 29The first celebration and gathering of the third -party developer of the model family, may have a lot of feed for discussion. We’ll track everything, stay tuned.

Daily visions about business use cases with VB daily

If you want to persuade your boss at work, you have covered VB Daily. We give you the internal journalistic precedence over what companies do with obstetric artificial intelligence, from organizational transformations to practical publishing operations, so that you can share visions of the maximum return on investment.

Read with us privacy policy

Thanks for subscribing. Check more VB news bulletins here.

An error occurred.

https://venturebeat.com/wp-content/uploads/2025/04/cfr0z3n_vector_art_minimalist_acid_wash_flat_illustration_2D__4d1b2d49-d940-4a0c-86d3-beb181a36b7c_0.png?w=1024?w=1200&strip=all
Source link

Llama 4 begins confusion and criticism among artificial intelligence users

Sinfield changed the show forever, according to Jason Alexander

Is Mission Produce (Avo) a small stock of maximum with huge upward potential?

Leave a Comment Cancel reply