OpenAI has proudly announced the launch of its latest flagship voice model, the GPT-Realtime-1.5. This cutting-edge model is tailored for voice agents and customer service applications, boasting an 'audio input, audio output' functionality. It accommodates a diverse range of inputs, including text, audio, and images, while delivering outputs in both text and audio formats. Impressively, it features a 32,000-token context window and can generate outputs of up to 4,096 tokens. GPT-Realtime-1.5 excels in real-time conversations, voice transcription, and multimodal interactions, and has been seamlessly integrated into the Realtime API endpoint. Regarding pricing, the cost for audio input is set at $32 per million tokens, with audio output priced at $64. For text inputs, the cost is $4 per million tokens, while text outputs are priced at $16. Presently, the model is accessible to qualified developers exclusively through the OpenAI API.
