StepFun has officially made its multimodal large-scale model, Step3-VL-10B, available as open-source. Despite having a mere 10 billion parameters, this model delivers performance on par with, or even outperforming, leading open-source and closed-source models that possess 10 to 20 times more parameters. This is evident in core evaluations, including visual perception, logical reasoning, mathematical contests, and general conversation, thereby breaking through long-standing industry bottlenecks.
