As large language models built on the Transformer architecture, such as BERT and GPT, become increasingly prevalent, artificial intelligence has showcased remarkable abilities in comprehension and expression. This development underscores its vast potential for boosting productivity. Notably, the attention mechanism—a central computational component of these models—plays a pivotal role in determining overall performance through its energy efficiency and processing speed.