Openai upgrades the artificial intelligence models of copying and generating sound

Photo of author

By sarajacob2424@gmail.com


Openai introduces new copies and generating sound to API, which the company claims to improve in its previous versions.

For Openai, models are suitable for its broader “Agentic” vision: building automatic systems that can accomplish tasks independently on behalf of users. The definition of the “agent” may be in a disputeBut the head of the Openai Olivier Godment described one explanation as Chatbot can speak with the company’s customers.

“We’ll see more and more agents float in the coming months,” said Godment Techcrunch during a briefing. “Therefore, the general issue helps customers and developers to benefit from the beneficial, available and accurate agents.”

Openai claims that the text model to the new speech, “GPT-4O-MINI-TTS”, does not provide a more accurate and realistic discourse but also “more susceptible” than previous speech obligations. The developers can guide GPT-4O-MINI-TTS on how to say things naturally-for example, “occurs like a crazy world” or “use a calm voice, like a mind teacher.”

Here is the voice of “real crime”:

Here is a sample of a “professional” voice:

Jeff Harris, a member of Openai’s producer, told TECHRUNCH that the goal is to allow developers to adapt both the “experience” of the sound and “context”.

“In different contexts, you don’t just want a flat sound and monotony,” Harris said. “If you are in the experience of customer support and want to be a apology because he made a mistake, you can actually have a voice that these feelings enjoy … our great faith, here, is that developers and users really want to control, not only what is spoken, but how to talk about things.”

As for the speech forms to the new text from Openai, the “GPT-4O TRECRIBE” and “GPT-4O-MINI-CRIBS The copy of the whisper. Training on high -quality high -quality audio collections, new models can capture a better teacher and varied letter, Openai’s demands, even in chaotic environments.

Harris added that they are less likely to hallucinations. The whisper tends to manufacture words – Even the full corridors – in conversations, enter everything from racist suspension into imaginative medical treatments into texts.

Harris said: “(R) Hese models have been significantly improved in exchange for whisper on that front.” “Ensure that accurate models are quite necessary to get a reliable audio experience, and accurately (in this context) means that the models hear the words accurately (and) do not fill the details that they have not heard.”

However, the number of your miles may vary depending on the copied language.

According to Openai’s internal standards, GPT-4O Transcripts, which is more accurate between two versions of copying, has a “error rate in words” approaching 30 % (out of 120 %) for round languages ​​and Darvidian such as Tamil, Telego, Malayam, Kanada. This means that three of every 10 words of the model will differ from human copies in those languages.

Openai copy results
OpenAi copy results.Image credits:Openai

At a break from imitation, Openai is not planning to openly provide new copy models. Company Historically, new versions of whisper were released For commercial use under the Massachusetts Institute of Technology license.

Harris said that GPT-4O TRECRIBE and GPT-4O-MINI-RANSCRIBE “are much larger than whisper” and therefore are not good candidates for an open version.

“(T) No, it is not the type of model that you can turn on locally on your laptop, like a whisper,” continued. (W) We want to make sure that if we make things in the open source, we do it carefully, and we have a model that was really used to that specified need. We believe that the end user devices are one of the most interesting cases of the open models of the source. “

Updated on March 20, 2025, 11:54 am PT to clarify the language About words error rate and update the standard results scheme with a more modern version.



https://techcrunch.com/wp-content/uploads/2023/11/openAI-pattern-04.jpg?resize=1200,675

Source link

Leave a Comment