On April 21, a research team from the University of Washington in the United States presented a cutting-edge prototype of earbuds named VueBuds. These innovative earbuds integrate a compact camera into standard true wireless earbuds and are equipped with a Vision Language Model (VLM). This setup allows users to describe their surroundings, recognize objects, and translate text in real time through voice commands. For example, if a user asks, "Help me translate this," while looking at a Korean food package, the AI voice in the earbuds promptly responds, "The text above means 'cold noodles.'"
The VueBuds prototype employs low-resolution black-and-white cameras. Images captured by these cameras are sent via Bluetooth to a smartphone or a nearby device, where a compact AI model processes them. This results in a swift response time of about 1 second. Notably, all image capturing and AI processing occur locally on the device, eliminating the need for cloud uploads. When the system is active, an indicator light turns on to show that recording is in progress, and users have the option to delete images anytime to safeguard their privacy. This breakthrough was showcased at the CHI 2026 conference in Barcelona on April 14.
