The Tencent Hunyuan research team has introduced its findings on SRPO (Semantic Relative Preference Optimization), a reinforcement algorithm tailored for text-to-image models. This innovation effectively resolves the problem of 'excessively greasy' skin textures in characters produced by the open-source text-to-image model Flux. By doing so, it significantly boosts the realism of portraits, achieving a threefold enhancement.
Traditional online reinforcement learning techniques depend on pre-trained reward models, which are not only expensive but also lack robust generalization capabilities. SRPO, however, allows for real-time adjustments to reward models through semantic preferences and enables targeted optimization by incorporating control prompts.
To counteract the potential for reward hacking introduced by semantic guidance, the research team has implemented the 'Semantic Relative Preference Optimization' strategy alongside the Direct-Align strategy. These measures substantially reduce reconstruction errors and facilitate the optimization of the initial phase of the generation process.
SRPO exhibits remarkably high training efficiency, outperforming DanceGRPO within just 10 minutes of training and achieving state-of-the-art (SOTA) quantitative metrics. In human evaluations, the generated images showed over a threefold improvement in both realism and aesthetic quality, while the training time was slashed by a factor of 75.