Last year, OpenAI unveiled GPT-4, introducing an enhanced speech mode that leverages a sophisticated multimodal model. This mode boasts a swift response time, with the fastest response occurring within 232 milliseconds and an average of 320 milliseconds, closely mirroring human conversational speed. Additionally, it excels in producing more natural-sounding audio, capturing non-verbal nuances such as speaking pace, and effectively conveying emotions.