Ark of Infinity recently released the General Audio Large Model (GPA), which employs a unified autoregressive Transformer architecture. It integrates three major functions—speech recognition, speech synthesis, and voice conversion—into a single framework, breaking through the traditional dispersed Pipeline design pattern of speech systems.
