Apple and Tel Aviv University Introduce Innovative PCG Technology for Accelerated Speech Generation
15 hour ago / Read about 0 minute
Author:小编   

As reported by 9TO5GOOGLE, a collaborative team of researchers from Apple and Tel Aviv University has unveiled the 'Principled Coarse-Grained' (PCG) approach. This groundbreaking method is designed to markedly accelerate the generation pace of autoregressive text-to-speech (TTS) models. In the realm of speech synthesis, the PCG technique operates by categorizing speech tokens with similar acoustic properties into unified groups. During the decoding process, it relaxes the stringent matching requirements, which in turn boosts the success rate of speculative decoding.
Through rigorous experimentation, it has been demonstrated that the PCG method can elevate the speed of speech generation by around 40%, all without the need for retraining the existing model. Moreover, it preserves a low word error rate, ensures high speaker similarity, and achieves a naturalness score of 4.09.
From a practical standpoint, this innovative solution necessitates only an extra 37MB of memory to store the acoustic grouping data, rendering it highly compatible with devices that have limited resources. Looking ahead, the PCG technology is poised to offer robust and high-caliber technical backing for real-time speech functionalities on upcoming Apple platforms.