Ollama has triumphantly launched its bespoke multimodal AI engine, which stands independent from the llama.cpp framework. Crafted with golang, this innovative engine markedly elevates the precision of local inference and bolsters large-scale image processing capabilities. By seamlessly integrating image processing metadata, KVCache optimization, and image caching functionalities, the new engine has achieved groundbreaking advancements in memory management and resource utilization. Furthermore, it supports chunked attention and 2D rotation embedding techniques, empowering the efficient handling of intricate models like Llama 4 Scout.
