Discussions on artificial intelligence standards have arrived in Pokemon

Photo of author

By [email protected]


Not even Pokemon is safe from the controversy of the measurement of artificial intelligence.

Last week, a After x Virusi, claiming that the latest GEMINI of Google exceeded the Pioneer Claude model in the Antarbur in the original Pokémon video game. According to what was reported, Gemini arrived in the city of Lavender in the developer Nashl Movement; Claude was Stalled in Mount Moon As of late February.

But what the publication failed to mention was that Gemini had an advantage.

like Users on Reddit He pointed out that the developer, who maintains the Gemini, built the designated minimum helps the model to determine the “tiles” in the game, such as roaming trees. This reduces the need for Gemini to analyze screen shots before making play decisions.

Now, Pokémon is a semi-malicious standard at best-a little they argue that it is a very useful test of model capabilities. But that He is An educational example of how different applications affect the standard on results.

For example, man I mentioned Two degrees of the Human Sonnet 3.7 model were checked in the standard schedule, which was designed to assess the capacity of the model coding. Claude 3.7 Sonnet achieved a resolution of 62.3 % on Swe-Bused verified, but 70.3 % with a “custom scaffold” developed human.

Recently, dead Seize A copy of one of its latest models, Llama 4 MAVERICK, for a good performance on a specific standard, LM Arena. the Vanilla version One of the typical grades is much worse on the same evaluation.

Given that artificial intelligence standards – Pokemon, including – are Including measures First of all, custom and non -standard applications are threatened with muddy water further. This means, it does not seem like it will be easier to compare the models when releasing.





https://techcrunch.com/wp-content/uploads/2019/01/pokemon.png?resize=1200,674

Source link

Leave a Comment