Moonshot AI's Kimi has published a technical report that delves into a revamped design of the residual connections within the foundational architecture of large-scale models. This innovative approach empowers each layer of the model to selectively concentrate on the outputs generated by preceding layers. As a result, this enhancement has significantly boosted the training efficiency of the 48B model by 1.25 fold, positioning it as a pivotal module for the forthcoming generation of models. The research was spearheaded by a team of numerous researchers, with Moonshot AI's three co-founders at the helm. After the paper's release, it was met with widespread acclaim from notable figures including Musk, Andrej Karpathy, and Jerry Tworek.
