The GCC is witnessing a fundamental shift in document intelligence, moving from theoretical AI ethics toward quantifiable ROI and secure, localized intelligence. While standard OCR merely identifies characters, decision-makers today require systems that are searchable, auditable, and capable of unlocking the meaning within millions of "locked" handwritten government and historical records.
As Fahad Faisal Fahad AlSaud, co-founder of CoreTechX, notes, "Intelligence from Arabic documents should not be treated as a long-term ambition. It is a necessity at the present time. We are enabling leaders to turn decades of silent archives into active strategic assets."
To bridge this strategic gap, CoreTechX developed Raqmn.ai, a turn-key solution designed to make the digitization and analysis process more efficient and secure. Unlike fragmented tools, Raqmn.ai integrates the entire recognition process, including image processing, text detection, and output generation, into a unified, high-speed framework.
"We did not rely on off-the-shelf OCR," explains Fahad Durukan, co-founder of CoreTechX. "We built our own end-to-end pipeline tailored specifically to Arabic handwriting, including a hybrid CNN–Transformer architecture optimized for both character and line-level recognition. This ensures that the nuances of Arabic script are preserved, not lost in translation."

CoreTechX
The pipeline operates through several sophisticated stages:
CoreTechX has moved beyond traditional CNN-RNN-CTC models toward a Hybrid CNN–Transformer architecture. This approach is superior at capturing long-range dependencies and global context, which is essential given that Arabic letters change shape depending on their position.
To solve the region's "data scarcity" problem, the system utilizes Synthetic Pre-Training with custom-generated images that mimic real-world noise. This is followed by Multi-Domain Fine-Tuning across diverse datasets, such as KHATT and Muharaf. By modifying "cross-attention layers," the model avoids "over-forgetting" historical orthography while gaining contemporary precision.
The latest technical results represent a new State-of-the-Art (SOTA) for the Arabic language. In comprehensive benchmarks comparing models like Azure, Google-Vision, Claude, and Gemini, the Raqmn.ai engine, referred to as CTX, emerged as a leader in consistency and character accuracy:
Because GCC government and historical institutions demand strict data sovereignty, CoreTechX avoids external API risks by offering a fully on-premise system. This ensures sensitive data remains entirely within the client's infrastructure.
By layering generative AI and vectorized retrieval on top of structured data, archives become interactive knowledge systems. "Our goal is to structure this vast unstructured corpus and make it accessible to everyone—governments, researchers, businesses, and the public," says AlSaud. As the platform evolves into RAQMN, it is set to become the backbone of structured Arabic knowledge, scaling employee productivity by at least threefold.
