The Natural Language Intelligence team at Tongyi Lab has introduced the VRAG-RL framework, an open-source tool that leverages reinforcement learning and multi-modal techniques to tackle the intricate task of retrieving and reasoning about key information from complex visual documents. This framework utilizes vision perception actions to refine information extraction, employs a multi-expert sampling strategy, and incorporates a fine-grained reward mechanism to boost performance. Furthermore, it accelerates training through the utilization of the GRPO algorithm. Experimental results showcase VRAG-RL's proficiency in various visual tasks, facilitating multi-round interactions and precise reasoning.