OpenAI Sees Things Through Cameras, Phones, and Computer Screens: Speech-only Chatbots in the Open World and the Future of Android
That vision for the future of AI is strikingly similar to one showcased by OpenAI on Monday. The new interface for the chatgtp can be used to converse via voice and talk about what’s seen through a phone camera or computer screen. A new artificial intelligence model called GPT-4o in the version of the game uses a more humanlike voice and emotional tone, mimicking emotions like surprise and even flirtatiousness.
Over the next few weeks, we’ll be rolling out these capabilities to everyone, and it just feels so magic, and that’s wonderful.
At another point in the demo, ChatGPT responded to OpenAI researcher Barret Zoph’s greeting by asking, “How can I brighten your day today?” When Zoph asked the chatbot to take a picture of him and say what he was feeling, she replied “I’ll put my emotional detective hat on.”
In a blog post Monday, OpenAI’s CEO, Sam Altman, highlighted the significance of the new interface. “It feels like AI from the movies; and it’s still a bit surprising to me that it’s real,” Altman wrote. It turns out that getting to human-level response times and expressiveness is a big change.
I had a chance to speak with Burke and Sameer Samat about what’s New in the World of Android, as well as the Future of the OS. Samat referred to these updates as a “once-in-a-generational opportunity to reimagine what the phone can do, and to rethink all of Android.”
In response to spoken instructions, she was able to comprehend objects viewed through the cameras and converse about them in natural language. It identified a computer speaker and answered questions about its components, recognized a London neighborhood from a view out of an office window, read and analyzed code from a computer screen, wrote a limerick about pencils, and remembered the location of a pair of glasses.
Hassabis said in an interview ahead of today’s event that he thinks text-only chatbots will prove to be just a “transitory stage” on the march toward far more sophisticated—and hopefully useful—AI helpers. “This was always the vision behind Gemini,” Hassabis added. “That’s why we made it multimodal.”
The opt-in experience on the phone is what the way to view it is. I think that over time, Gemini is becoming more advanced. We don’t have anything to announce today, but consumers have a choice if they want to use this new assistant. We’re seeing that a lot of people are doing that. and we’re getting a lot of great feedback.”
Beyond Now on Tap: The Impact of Google’s AI AI Assistant on Student Behavior and the Teaching of Physics and Mathematical Problems
A decade ago, they showed a feature called Now on Tap in the latest iteration of the operating system, which allowed users to tap and hold the home button to see contextual information on what is on the screen. A friend is texting and talking about a movie. You can get the details of the title on tap, without leaving the messaging app. You can check out a restaurant on the popular review site, Yelp. The phone could surface OpenTable recommendations with just a tap.
I felt that these improvements were exciting and magical, and the ability to sense what was on the screen and predict the actions you might want to take felt future-facing. I liked it one of the most. It was good in its own right, but not as great as it could have been.
While at the I/O developer conference in Mountain View, California, today, the new features that were on display feel like Now on Tap of old, allowing you to access contextual information around you to make using your phone a bit easier. Except this time, these features are powered by a decade’s worth of advancements in large language models.
Samat claims Google has received positive feedback from consumers, but Circle to Search’s latest feature hails specifically from student feedback. Circle to Search can be used on physics and math problems where a user circles them, and then they can get a step by step instructions on completing the problems.
According to Samat, it was clear that Gemini wasn’t just giving answers but was showing students how to solve problems. More complex problems will be solved later this year by Circle to Search. This is all powered by Google’s LearnLM models, which are fine-tuned for education.
Gemini is Google’s AI assistant that is in many ways eclipsing Google Assistant. If you want to replace the Assistant on most phones, there’s an option to do so with Gemini. I asked Burke if it meant the assistant was heading to the graveyard.