Google TalkBack will use Gemini to describe images for blind people

May 14, 2024 ndowd

The company announced that Gemini Nano capabilities are coming to the company’s accessibility feature, TalkBack. This is a great example of a company using generative AI to open its software to more users.

Gemini Nano is the smallest version of Google’s large-language-model-based platform, designed to be run entirely on-device. That means it doesn’t require a network connection to run. Here the program will be used to create aural descriptions of objects for low-vision and blind users.

In the above pop-up, TalkBack refers to the article of clothing as, “A close-up of a black and white gingham dress. The dress is short, with a collar and long sleeves. It is tied at the waist with a big bow.”

According to the company, TalkBack users encounter around 90 or so unlabeled images per day. Using LLMs, the system will be able to offer insight into content, potentially forgoing the need for someone to input that information manually.

“This update will help fill in missing information,” Android ecosystem president, Sameer Samat, noted, “whether it’s more details about what’s in a photo that family or friends sent or the style and cut of clothes when shopping online.”

The device will be arriving on Android later this year. Assuming it works as well as it does in the demo, this could be a game changer for blind people and those with low vision.

We’re launching an AI newsletter! Sign up here to start receiving it in your inboxes on June 5.