SenseTime, in collaboration with Nanyang Technological University, has jointly launched the preview edition of NEO-unify. This innovative architecture breaks away from the conventional use of visual encoders and variational autoencoders, opting instead for an end-to-end native approach that directly learns from both pixels and textual data. When it comes to image reconstruction tasks, its performance is on par with that of Flux VAE, boasting an image editing benchmark score of 3.32 points. Research findings demonstrate that this architecture not only fosters synergistic enhancements in comprehension and generation abilities but also outperforms existing solutions in terms of data training efficiency.
