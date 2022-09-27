Last week, OpenAI released an open-source neural network capable of transcribing audio to text with near-human levels of performance.

The release follows the recent rollout of other high-profile open-source projects, including Stability.AI’s Stable Diffusion model, ARK Analyst William Summerlin wrote in a September 26 newsletter.

Trained on over 600,000 hours of audio data, OpenAI’s Whisper model transcribes English into non-English speech and vice-versa, according to Summerlin. “Large language models demand increasingly massive sets of text data, suggesting that accurate audio speech recognition tools will activate important training data,” Summerlin wrote. “As models like Whisper interface with large language models like GPT-3 seamlessly and accurately, audio data should become critical to the artificial intelligence training process.”

OpenAI’s DALL·E 2, the AI system that creates realistic images and art from a description in natural language, impressed the industry this summer with its ability to generate creative images from text prompts. In the last few months, DALL·E 2 has passed some important milestones: commercial availability, pricing tiers, and more restrictive content mediation.

