The new Openai AI Model GPT-4O TRECRISS model allows you to add a letter to your current text applications in seconds

Photo of author

By [email protected]


Join daily and weekly newsletters to obtain the latest updates and exclusive content to cover the leading artificial intelligence in the industry. Learn more


Openai models have got AI In a problem with actor Scarlett JohanssonBut this does not prevent the company from continuing to make its offers in this category.

today , Chatgpt maker statement Three, all new special audio models are called GPT-4O Transcinchand GPT-4O-MINI-RANCECist and GPT-4O-MINI-TTSIt is initially available in the API interface for third -party program developers to create their own applications on top, as well as on a dedicated pilot site, Openai.fmIndividual users can access limited and fun tests.

https://www.youtube.com/watch?

Moreover, the sounds of the GPT-4O-MINI-TTS model can be dedicated from several priorities through the text router to change their accents, a playground, tone, and other sound qualities-including the transfer of any emotions that the user asks, which must spend a long way to process any fears of Openai reducing any specific user voice (The company previously denied that this is the case with JohanssonBut below the apparent imitator option, in any case). Now it comes to the user to report how they want their voice from artificial intelligence when talking again.

In a experimental show with Venturebeat was delivered via a video call, Jeff Harris, the technical employee Openai, showed how to use the text alone on the experimental display site, the user can get the same sound to look like a crazy flag or a quiet yoga teacher.

Discover and refine new capabilities within the GPT-4O base

Models are variables from The current GPT-4O model was launched in May 2024 Which is currently working to play the Chatgpt text and try the sound for many users, but the company took this basic model and then trained it with additional data to make it outperform the copies and speech. The company did not specify when the forms may come to Chatgpt.

Harris said: “Chatgpt has a little different requirements in terms of cost and performance, so while I expect to move to these models in time, at the present time, this launch focuses on API users,” Harris said.

It aims to solve the text model to open-minded speech, which is two years old, which provides lower words error rates through industry standards and improving performance in loud environments, with various accents, and in different speech speeds-through 100 languages.

The company posted a scheme on its website that shows the low GPT-4O models ’error rates in identifying words across 33 languages, compared to whisper-with 2.46 %, which is impressive in the English language.

Harris said: “These models include abolishing noise and semantic audio activity detector, which helps to determine when the headset ends in thinking, which improves the accuracy of copies.”

Harris Venturebeat told the new GPT-4O TRECRIBE to the new GPT-T4O TRECRIBE model is not designed to present “notes”, or the ability to name and distinguish between different loudspeakers. Instead, it was designed primarily to receive one (or perhaps multiple sounds) as one input channel and respond to all inputs with one output sound in this reaction, no matter how long.

The company also hosts a competition for the public to find the most creative examples of using its experimental website Openai.fm and sharing it online by putting a sign Openai account on x. The winner is scheduled to get a dedicated engineering radio with adolescent with Openai Logo, which Openai Head, OLIVIERIEREMENT, said it is one of only three in the world.

Voice applications of gold mine

These improvements make especially suitable for applications such as customer communication centers, observation copies, and artificial intelligence aides.

Function, and The newly launched company agents SDK Last week, developers who have already built applications over large text-based language models such as GPT-4O are allowed to add liquid audio reactions with only about “nine lines of software instructions”, according to a presenter during the Openai YouTube period that announces new models (included above).

For example, the electronic commercial application designed on top of the GPT-4O can now reply to the user’s rotating questions like “Tell me about my recent requests” in speaking with the code just by adding these new models.

Harris said: “For the first time, we present the words to the flowing text, which allows developers to enter the sound continuously and receive a text flow in the actual time, which makes the conversations more normal.”

However, for those looking for actual time actual time, Openai recommends using speech models to speak in API in actual time.

Pricing and availability

The new models are available immediately via the API from Openai, with prices as follows:

GPT-4O TransIn: 6.00 dollars per 1 million symbols inserting sound (~ 0.006 dollars per minute)

GPT-4O-MINI-RANSCRIBE: 3.00 dollars per 1m sound input codes (~ 0.003 dollars per minute)

GPT-4O-MINI-TTS: 0.60 dollars per 1m text input, $ 12.00 per 1m audio output codes (~ 0.015 dollar per minute)

However, they reach a time of the most famous competition in the area of ​​copying and speech of artificial intelligence Elevenlabs introduces the new writer model This supports diarrhea and is characterized by a similar error rate (but not low) by 3.3 % in English, and $ 0.40 per hour of the input sound (or $ 0.006 per minute, which is almost equivalent).

Another start, Hume Ai introduces a new model for Octave TTS With allocation at the sentence level and even at the level of words for speech and emotional reflection-based on the entire user’s instructions, not any pre-defined sounds. The pricing of Octave TTS cannot be compared directly, but there is a free layer that offers 10 minutes of sound and costs from there

Meanwhile, the most advanced voice and speech models also reach the open source community, including one called Orpheus 3B is available with APache 2.0 licenseIn the sense that the developers do not have to pay any costs to run – provided that they have the right devices or cloud servers.

Adopting industry and early results

Several companies have already incorporated the new OPNAI audio models on their platforms, as they reported significant improvements in AI Voice’s performance, according to the certificates that Openai with Venturebeat.

ELISEAI, a company that focused on the automation of property management, found that the Openai model from the text to words had enabled natural and emotional reactions with tenants.

Self -timing, maintenance, and tastiest sounds have made more attractive, which leads to high tenant satisfaction and improve calling rates.

Decgon, which builds a 30 % -working audio experience, has witnessed a 30 % improved copy use using the speech recognition model in Openai.

This increase in the accuracy of Decagon AI allowed more reliable performance in the real world scenarios, even in loud environments. The integration process was fast, as Decagon merged the new model into its system within one day.

Not all reactions to the latest version of Openai were warm. Founder of Dawn Ai App Analytics BenhylakThe former human Apple façades designer, posted on X that although the models seem promising, the advertisement “seems to have retracted the actual time”, indicating the shift from the previous focus on Openai on artificial intelligence low conversation via Chatgpt.

In addition, the launch has already been released early on X (Twitter). Testingcatalog News (Testingcatalog) published Details about the new models several minutes before the official announcement, as the GPT-4O-MINI-TTS, GPT-4O TVRIBE and GPT-4O-MINI-CRIBS. The credit for the leakage was to @stiventhedev, and the publication soon gained traction.

But looking forward, Openai plans to continue improving its own sound models and exploring dedicated vocal capabilities while ensuring safety and artificial intelligence consistent. Overseas, Openai also invests in multimedia intelligence, including video, to enable more dynamic and interactive experiences based on the agent.



https://venturebeat.com/wp-content/uploads/2025/03/robot-answers-phone.png?w=1024?w=1200&strip=all
Source link

Leave a Comment