Google TalkBack will use Gemini to describe images for blind people
Comment
The company announced that Gemini Nano capabilities are coming to the company’s accessibility feature, TalkBack. This is a great example of a company using generative AI to open its software to more users.
Gemini Nano is the smallest version of Google’s large-language-model-based platform, designed to be run entirely on-device. That means it doesn’t require a network connection to run. Here the program will be used to create aural descriptions of objects for low-vision and blind users.
In the above pop-up, TalkBack refers to the article of clothing as, “A close-up of a black and white gingham dress. The dress is short, with a collar and long sleeves. It is tied at the waist with a big bow.”
According to the company, TalkBack users encounter around 90 or so unlabeled images per day. Using LLMs, the system will be able to offer insight into content, potentially forgoing the need for someone to input that information manually.
“This update will help fill in missing information,” Android ecosystem president, Sameer Samat, noted, “whether it’s more details about what’s in a photo that family or friends sent or the style and cut of clothes when shopping online.”
The device will be arriving on Android later this year. Assuming it works as well as it does in the demo, this could be a game changer for blind people and those with low vision.
We’re launching an AI newsletter! Sign up here to start receiving it in your inboxes on June 5.
Every weekday and Sunday, you can get the best of TechCrunch’s coverage.
Startups are the core of TechCrunch, so get our best coverage delivered weekly.
The latest Fintech news and analysis, delivered every Sunday.
TechCrunch Mobility is your destination for transportation news and insight.
By submitting your email, you agree to our Terms and Privacy Notice.
Looking Glass makes trippy-looking mixed-reality screens that make things look 3D without the need of special glasses. Today, it launches a pair of new displays, including a 16-inch mode that…
Replacing Sutskever is Jakub Pachocki, OpenAI’s director of research.
Intuitive Machines made history when it became the first private company to land a spacecraft on the moon, so it makes sense to adapt that tech for Mars.
As Google revamps itself for the AI era, offering AI overviews within its search results, the company is introducing a new way to filter for just text-based links. With the…
Blue Origin’s New Shepard rocket will take a crew to suborbital space for the first time in nearly two years later this month, the company announced on Tuesday. The NS-25…
This will enable developers to use the on-device model to power their own AI features.
It ran 110 minutes, but Google managed to reference AI a whopping 121 times during Google I/O 2024 (by its own count). CEO Sundar Pichai referenced the figure to wrap…
Firebase Genkit is an open source framework that enables developers to quickly build AI into new and existing applications.
In the coming months, Google says it will open up the Gemini Nano model to more developers.
As part of the update, Reddit also launched a dedicated AMA tab within the web post composer.
Here are quick hits of the biggest news from the keynote as they are announced.
LearnLM is already powering features across Google products, including in YouTube, Google’s Gemini apps, Google Search and Google Classroom.
The official launch comes almost a year after YouTube began experimenting with AI-generated quizzes on its mobile app.
Around 550 employees across autonomous vehicle company Motional have been laid off, according to information taken from WARN notice filings and sources at the company. Earlier this week, TechCrunch reported…
The keynote kicks off at 10 a.m. PT on Tuesday and will offer glimpses into the latest versions of Android, Wear OS and Android TV.
Google Play has a new discovery feature for apps, new ways to acquire users, updates to Play Points, and other enhancements to developer-facing tools.
Soon, Android users will be able to drag and drop AI-generated images directly into their Gmail, Google Messages and other apps.
Veo can capture different visual and cinematic styles, including shots of landscapes and timelapses, and make edits and adjustments to already-generated footage.
In addition to the body of the emails themselves, the feature will also be able to analyze attachments, like PDFs.
The summaries are created based on Gemini’s analysis of insights from Google Maps’ community of more than 300 million contributors.
Google says that over 100,000 developers already tried the service.
The system effectively listens for “conversation patterns commonly associated with scams” in-real time.
The standard Gemma models were only available in 2 billion and 7 billion parameter versions, making this quite a step up.
This is a great example of a company using generative AI to open its software to more users.
Google’s Circle to Search feature will now be able to solve more complex problems across psychics and math word problems.
People can now search using a video they upload combined with a text query to get an AI overview of the answers they need.
A search results page based on generative AI as its ranking mechanism will have wide-reaching consequences for online publishers.
Google has built a custom Gemini model for search to combine real-time information, Google’s ranking, long context and multimodal features.
At its Google I/O developer conference, Google on Tuesday announced the next generation of its Tensor Processing Units (TPU) AI chips.
Google is upgrading Gemini, its AI-powered chatbot, with features aimed at making the experience more ambient and contextually useful.
Facebook
Youtube
LinkedIn
X
Instagram
Mastodon
Powered by WordPress VIP
Leave a Comment