AI Builds Complete Languages From Scratch: ACL Paper Tops Every Prior Conlang System
2 hour ago / Read about 31 minute
Source:TechTimes

Davidson L U N A/Unsplash

For decades, building a complete constructed language — a conlang — meant years of painstaking work by expert linguists. J.R.R. Tolkien spent the better part of a lifetime perfecting Quenya. Marc Okrand gave Klingon a fully functional grammar on commission for Star Trek. As of June 27, 2026, a new AI pipeline called ConlangCrafter can do what those human experts did — and in peer-reviewed trials, it surpassed them on every measurable dimension of linguistic diversity. The finding lands at a significant moment: it offers concrete evidence that large language models have internalized portable, abstract models of how language structure works — not just the statistical surface patterns of the text they trained on.

What ConlangCrafter Actually Does

ConlangCrafter is not itself a trained language model. It is an agentic, multi-hop pipeline that orchestrates existing large language models — DeepSeek-R1 and Gemini 2.5 in its published tests — to build complete languages from scratch, layer by layer, in a fixed sequence: phonology first, then morphology and syntax, then lexicon, then translation. At each stage, the pipeline prompts the base model with the language sketch built so far, adds the next linguistic layer, and writes the result back into a shared memory structure the paper calls a "language sketch." Later stages condition on everything generated before them, ensuring the resulting language is grammatically self-consistent from its sound system to its sentence structure.

Two mechanisms drive the system's performance. The first is randomness injection: before each phonology and grammar stage, the pipeline prompts the model to generate a checklist of ten typological features — word order, case marking, tonal system, and similar structural parameters — each with five possible values. A random number generator then selects which option the language will use for each feature, forcing genuine typological variation across runs rather than the convergent outputs that language models tend to produce when prompted without constraints. The second is a self-refinement loop: a critic instance of the same model reads the generated language sketch, flags contradictions, and an editor instance revises them, repeating until no further inconsistencies remain.

Together, the two mechanisms produced a striking result. Using Gemini 2.5 Pro as the base model, ConlangCrafter achieved a typological diversity score of 0.59 — compared to 0.26 for a naive single-prompt baseline, a more than twofold improvement. The metric, derived from the World Atlas of Language Structures, measures the average proportion of fundamental linguistic features on which any two generated languages differ. A score of 0.59 means that across a sample of twenty generated languages, any two chosen at random differ on roughly six out of ten basic structural parameters — comparable to the gap between genuinely unrelated natural languages.

What This Pipeline Reveals About LLM Capabilities

The deeper implication of ConlangCrafter's success is not about conlangs specifically. It concerns what the result reveals about what large language models have actually learned.

Critics have long argued that these systems are sophisticated statistical pattern-matchers: they predict the next token based on learned correlations in training data, but hold no genuine understanding of the underlying structure of language. ConlangCrafter's results challenge that framing directly. To produce an internally coherent language system with a sound inventory, morphological paradigms, syntactic rules, and a consistent lexicon — all from scratch, with no matching example in its training data — a base model must have something more than surface-level pattern memory. It must have internalized the abstract structural logic of what makes a language system function: which features depend on which others, what constitutes a contradiction, what typological combinations are possible.

Gašper Beguš, an associate professor of linguistics at UC Berkeley and one of the paper's co-authors, captured the practical significance: models "are able to imagine or come up with things that we might not, and we can learn so much from that," he told IEEE Spectrum.

The paper, written by lead authors Morris Alper and Moran Yanuka — both completing their doctorates at Tel Aviv University — alongside Raja Giryes and Beguš, builds on a prior 2025 study by Beguš and colleagues published in IEEE Transactions on Artificial Intelligence. That prior work documented that large language models possess measurable metalinguistic knowledge — the capacity to reason explicitly about language rather than merely through it. ConlangCrafter is, in that framing, a demanding test of that metalinguistic knowledge: it cannot be passed by pattern-matching on seen languages, because the required output — a new language — has no template in the training data.

LLM Worldbuilding and the Constructive Translation Problem

One of the most technically distinctive elements of ConlangCrafter is what the paper calls "constructive translation." Once the pipeline has generated a language sketch, it can translate arbitrary input sentences into the generated language — but with a critical difference from standard translation: the target language may be underspecified. If the input sentence requires a grammatical structure or word that does not yet exist in the sketch, the pipeline generates it on the spot, writes it back into the sketch, and uses it consistently in all subsequent translations.

This is the opposite of how low-resource translation works, where hallucination is a failure mode to be suppressed. ConlangCrafter deliberately exploits the base model's generative capacity at the lexical and grammatical frontier of the new language, treating creative invention as a feature rather than a defect. The result is a pipeline that simultaneously builds a translation corpus and expands the language's grammar and lexicon, producing a coherent, growing linguistic system rather than a fixed specification.

That capability has direct applications beyond academic linguistics. The paper identifies procedural worldbuilding in open-world video games as an immediate target: a game engine could use ConlangCrafter to generate a distinct, internally consistent language for each fictional society in the game world, complete with translatable dialogue and grammatical documentation. The same pipeline could assist professional conlangers — the linguists and hobbyists who build languages for fiction and film — by handling the most labor-intensive stages of lexicon generation and grammatical formalization while leaving creative direction to the human designer.

Where the Pipeline Falls Short

The paper is candid about what ConlangCrafter cannot yet do. The phonology and grammar stages are bounded by what the base model knows about linguistic typology — and that knowledge, derived from training data, is skewed toward well-documented, high-resource languages. The system is unlikely to generate languages with features that are rare in the world's documented languages, not because those features are impossible, but because they are underrepresented in the data that shaped the model's metalinguistic knowledge. The pipeline also currently omits semantics, pragmatics, discourse strategies, and orthography — significant aspects of any complete language that would need to be addressed in future work.

The evaluation framework itself carries limitations the authors acknowledge. Measuring typological diversity and internal consistency with automated metrics is an imperfect proxy for the more demanding expert assessment that fully validating a constructed language would require. Translation consistency, in particular, remains a work in progress: the paper's own ablation results show that randomness injection alone initially depresses translation consistency, with the self-refinement loop only partially recovering the gap.

What Sets This Apart From Prior Conlang Systems

ConlangCrafter is, by the authors' framing, the first system to tackle end-to-end conlang creation — including phonology, grammar, lexicon, and translation — in a single unified pipeline without requiring human linguistic expertise at any stage. Earlier computational approaches addressed fragments of the problem: a grammar formalism here, a lexicon generator there. The multi-hop decomposition, which treats language design the same way multi-hop reasoning pipelines treat complex mathematical problems — breaking them into tractable sub-problems that build on each other — is the specific architectural choice that makes full-pipeline generation feasible.

The work is accepted for presentation at the 64th Annual Meeting of the Association for Computational Linguistics in San Diego, scheduled to open July 2, 2026. ACL is the top peer-reviewed venue in computational linguistics and natural language processing. The paper is available now on arXiv.


Frequently Asked Questions

Can AI now replace human conlangers?

Not in any meaningful creative sense. ConlangCrafter automates the most technically demanding and time-consuming parts of language construction — building a grammatically consistent phonology and morphology, generating a constrained lexicon, and translating test sentences. What it cannot do is supply the philosophical intent, aesthetic sensibility, or cultural grounding that makes a conlang like Tolkien's Quenya or David Peterson's Dothraki compelling as a creative artifact. The researchers describe the system as a "computational creativity aid" — a tool that eases laborious tasks while leaving creative direction to the human designer.

What does ConlangCrafter's success actually prove about how language models work?

This is the finding that matters most for AI research. To generate a grammatically self-consistent new language — one with no template in its training data — the pipeline's base model must draw on internalized knowledge of how linguistic systems are structured: which features depend on which others, what counts as a contradiction, what typological combinations are realistically possible. The paper builds on prior work by co-author Gašper Beguš documenting that large language models possess measurable metalinguistic knowledge — the ability to reason about language as a system, not just generate fluent text within one. ConlangCrafter is a demanding test of that capacity, and the results suggest the underlying metalinguistic knowledge is abstract and portable enough to apply to language systems that have never existed in the training data.

How could game developers and writers use this AI language creation tool?

The pipeline is designed to generate complete, internally consistent languages from a high-level description or a set of typological constraints the user specifies. A worldbuilder could ask for a language with a specific word order, tonal system, or grammatical structure — or leave all choices to the pipeline — and receive a complete language sketch with phonological rules, a morphological paradigm, sample sentences with interlinear glosses, and a seed lexicon. For writers and game studios building immersive fictional worlds, that output provides the kind of linguistic authenticity that previously required hiring a professional conlanger or spending years in study.

Where does this AI language generation research go next?

The paper points toward several open directions: expanding the pipeline to cover semantics, pragmatics, and orthography; scaling to larger grammars and lexicons; and modeling languages as evolving communication systems that change over time as they are used by agents across different contexts and cultures. There is also a noted application in low-resource language documentation, where the pipeline's emphasis on logical consistency within a formally specified grammar could assist in generating systematic language resources for languages documented primarily through written grammars with limited speaker data.