OpenDataLab has forged a partnership with DingTalk to introduce DLU (Document Language Understanding), a document parsing tool tailored for enterprise users. This innovative tool is built upon the intelligent document parsing engine, MinerU. Its primary objective is to tackle the hurdles associated with parsing unstructured data within large-scale model applications at the enterprise level. Soon, DLU will be made open-source, a move designed to diminish the obstacles hindering AI application development and to expedite the deployment of AI technologies across a diverse array of industries.
As a project under the auspices of the Shanghai AI Laboratory, MinerU has garnered significant acclaim among users, thanks to its precise parsing capabilities and extensive compatibility. Its popularity is evident from the fact that it has amassed over 40,000 stars on GitHub. DLU inherits all the strengths of MinerU, offering support for mainstream document formats such as Office and PDF, as well as DingTalk's proprietary document formats. It excels at accurately extracting intricate elements and transforming them into top-notch corpora. Looking ahead, DLU is poised for deep integration into DingTalk's office collaboration ecosystem. This integration will facilitate a seamless, closed-loop process that spans from document creation to customized model training.