The next-gen Artificial Intelligence model is called Gemini 1.5 and it is almost ready

admin

1 year ago

The Verge: https://www.theverge.com/2024/2/15/24073457/google-gemini-1-5-ai-model-llm

The Lord of The Rings as a Bigger, Smaller, More Context Window for Google, and how it can be used to build better web applications

Google is also launching new tools to help developers use Gemini in their applications, including new ways of tapping into the models’ ability to parse video and audio. It’s adding new features to its webbased coding tool, including ways forArtificial Intelligence to hack into code.

As he’s explaining this to me, Pichai notes offhandedly that you can fit the entire Lord of The Rings trilogy into that context window. This seems too specific, so I ask him: this has already happened, hasn’t it? Someone in Google is just checking to see if Gemini spots any continuity errors, trying to understand the complicated lineage of Middle-earth, and seeing if maybe AI can finally make sense of Tom Bombadil. “I’m sure it has happened,” Pichai says with a laugh, “or will happen — one of the two.”

The bigger context window will be useful for businesses. This allows use cases where you can give a lot of context and information to a query at a point in time. Think of it as we’ve expanded the query window. He imagines filmmakers might upload their entire movie and ask Gemini what reviewers might say; he sees companies using Gemini to look over masses of financial records. “I view it as one of the bigger breakthroughs we have done,” he says.

Eventually, Pichai tells me, all these 1.0s and 1.5s and Pros and Ultras and corporate battles won’t really matter to users. “People will just be consuming the experiences,” he says. “It’s like using a smartphone without always paying attention to the processor underneath.” He says that we are in a phase where everyone is familiar with the chip in their phone. “The underlying technology is shifting so fast,” he says. “People do care.”

Gemini Pro 1.5: Using Google DeepMind to Model the Amusing Content of Apollo 11 Communications Transcripts and Buster Keaton Movies

The model performs this kind of reasoning on every page, every single word, and it really feels like magic, according to Oriol Vinyals.

In a demo, Google DeepMind showed Gemini Pro 1.5 analyzing a 402-page PDF of the Apollo 11 communications transcript. The model was asked to find amusing portions, like when astronauts said that a communications delay was due to a sandwich break. Another demo showed the model answering questions about specific actions in a Buster Keaton movie. The previous version of Gemini could have answered these questions only for much shorter amounts of text or video. Google hopes that the new capabilities will allow developers to build new kinds of apps on top of the model.

Gemini Pro 1.5 is also more capable—at least for its size—as measured by the model’s score on several popular benchmarks. The new model exploits a technique previously invented by Google researchers to squeeze out more performance without requiring more computing power. It is possible to make a model more efficient to train and run if certain parts of it are activated.

Google says that Gemini Pro 1.5 is as capable as its most powerful offering, Gemini Ultra, in many tasks, despite being a significantly smaller model. Hassabis insists that the same technique can be applied to boost Gemini Ultra.

The rapid pace of progress in generative artificial intelligence is at odds with fears about its risks. Google says it has put Gemini Pro 1.5 through extensive testing and that providing limited access offers a way to gather feedback on potential risks. The company has given researchers at the UK’s artificial intelligence Safety Institute access to its most powerful models so that they can test them.