Dave Plummer, a former core developer of Windows, has successfully executed a single-layer, single-head Transformer model, ATTN-11, which comprises merely 1,216 parameters, on a vintage PDP-11/44 computer that dates back 47 years. This remarkable feat was accomplished using a 6MHz CPU and a mere 64KB of memory. The model, meticulously crafted in PDP-11 assembly language by Damien Boureille, demonstrated flawless performance, achieving a 100% accuracy rate after undergoing approximately 350 training iterations. The entire operation was completed in a swift 3.5 minutes, showcasing the enduring power and versatility of classic computing technology.
