These days, artificial intelligence can generate realistic images, write novels, do your duty, and even Expect protein structures. However, a new research reveals that it often fails on a very basic mission: say time.
The researchers at the University of Edinburgh tested the ability of seven well-known multimedia linguistic models-a kind of artificial intelligence that can explain and generate different types of media-to answer questions related to time based on different pictures of watches or calendars. Their studies, coming in April and It is currently hosted On the Preprint Arxiv server, it shows that LLMS faces difficulty in these basic tasks.
The researchers wrote in the study: “The ability to explain time and the reason in time from visual inputs is crucial for many applications in the real world-as it extends from scheduling events to independent systems.” “Despite the progress of the large multimedia language models (MLMS), most works focused on discovering objects, or the labels of the photos, or the understanding of the scene, leaving the temporal inference unstable.”
Test the GPT-4O team from Openai and GPT-O1; Google DeepMind’s Gemini 2.0; Anthropor Claude 3.5 Sonata. Meta’s Llama 3.2-11b-Vision-Instruct; Alibaba’s Qwen2-VL7B-Instruct; And Modelbest Minicpm-V-2.6. They feed the models different images of analog hours – time salaries with Roman numbers, different phone connections, and even some of them missing seconds – in addition to 10 years of orthodontic images.
For pictures around the clock, the researchers asked LLMS, WThe hat appears around the clock in the specified image? For orthodontic images, researchers asked simple questions such as, WThe day of the hat per week is New Year’s Day? And the most difficult information, including WThe hat is 153 of the year?
“The clock analog and the understanding of the calendar include complex cognitive steps: it requires optical identification with micro -grains (for example, hand -happy hand, daytime cell planning) and non -trivial numerical thinking (for example, daily displacement calculation),” the researchers explained.
In general, artificial intelligence systems did not work well. Read time on the analog hours correctly less than 25 % of the time. They have struggled with the hours that carry Roman numbers and the dominated hands as much as they were with hours that lack seconds in the hands of completely, indicating that the problem may stem from the discovery of hands and the interpretation of angles at the face of the hour, according to researchers.
GEMINI-2.0 from Google scored the highest level in the team’s watch mission, while GPT-O1 was accurate on the calendar mission 80 % of time-a much better result than its competitors. But even then, the most successful MLLM on the calendar mission committed about 20 % of the time.
“Most people can know the time and use the calendars from an early age. Rohit Saxina, the co -author of the study and a doctorate student at the Information College at Edinburgh University, at one of the universities, said that the results we reached highlighting the presence of a large gap in the ability of artificial intelligence to implement the basic skills of people. statement. “This deficiency must be addressed if artificial intelligence systems are successfully integrated into real world applications, such as scheduling, automation and auxiliary techniques.”
So, although artificial intelligence may be able to complete your home duty, it does not depend on it to stick to any final dates.
https://gizmodo.com/app/uploads/2025/03/clock-image.jpg
Source link