Home Trending OpenAI launches Whisper API – records what we say and translates it into English

OpenAI launches Whisper API – records what we say and translates it into English

0
OpenAI launches Whisper API – records what we say and translates it into English

The company behind ChatGPT, OpenAI, today launched the Whisper API, a new version of the open source Whisper speech-to-text software.

Whisper is a $0.006 per minute automatic speech recognition system that allows transcription from multiple languages ​​as well as translation from other languages ​​into English. It accepts files in various formats including M4A, MP3, MP4, MPEG, MPGA, WAV and WEBM.

Many companies have developed very powerful speech recognition systems that underlie the software and services of tech giants such as Google, Amazon, and Meta. “But what sets Whisper apart is that it has been trained on 680,000 hours of data in multiple languages ​​collected from the internet,” says OpenAI President Greg Brockman. According to him, the learning process improved the recognition of “special accents”, as well as the distinction between technical terms and background noise.

Brockman says his company has optimized the Whisper as much as possible. “It’s much, much faster and extremely convenient.”

However, Whisper has its limitations, especially when it comes to “next word” prediction. Because the system was trained on a lot of noisy data, OpenAI warns that Whisper may include unspoken words in its transcriptions, possibly because it tries to predict the next word when transcribing speech.

Currently, Whisper does not perform equally well in all languages, and the error rate is higher in languages ​​for which it had relatively less training data.

OpenAI believes Whisper’s transcription capabilities will enhance existing applications, products, and tools. The AI ​​language learning app Speak is already using the new Whisper model for in-app conversations.

If OpenAI manages to enter the speech-to-text market, it could make a significant profit. According to the report, the market could reach $5.4 billion by 2026, up from $2.2 billion in 2021.

“We really want to be this universal mind,” Brockman said. “With a lot of flexibility to be able to use any data you have, any work you want to do, and work multiplicatively.”

Source: TechCrunch

Author: newsroom

Source: Kathimerini

LEAVE A REPLY

Please enter your comment!
Please enter your name here