TechingToday

Google flashes everyone — The new Gemini Flash1.5 takes GPT-4o

General

Google has launched a new member of the Gemini family of artificial intelligence models. Sitting between the on-device Nano and cloud-based Pro, Gemini Flash designed for complex tasks that require chat, fast response and processing images, video and voice Gemini Flash1.5 is a native multimodal model similar to OpenAI's recently announced GPT-4o, announced at the google I/O developer event, and built for speed It is also useful for real-time conversations.

As the new model is currently available worldwide for developers to use in their own applications, we could soon see a number of third-party live chat apps built using Gemini Flash1.5.

We also saw news that it would power the Gemini Advanced premium chatbot, along with an upgrade to the Gemini Pro1.5, a model that was first released earlier this year.

Gemini Flash1.5 is just above the Nano in the size hierarchy and just below the Pro, and the difference from other AI models, as well as its siblings, is the combination of speed and agility.

In addition to being fast and impressive in its ability to understand text, images, video and audio, Flash1.5 is cheaper and at least 20 times cheaper compared to the more expensive Pro. "We know from user feedback that some applications require lower latency and lower cost," said Demis Hassabis, CEO of Google DeepMind. "This has inspired us to continue to innovate," he added, announcing Flash as "a model that is lighter than the 1.5Pro and designed quickly and efficiently to deliver at scale.""It's a good comparison with OpenAI's recently announced GPT-4o model, at least in terms of speed. It is very fast, natively multimodal and designed for real-time interaction. That said, Gemini Flash1.5 seems to be a less capable model in terms of reasoning.

Like the other Gemini family models, Flash1.5 comes with a massive one million token context window, which promises to be fully available in practice. In comparison, GPT-4o has 128,000 token content windows, while Claude3 has 200,000 tokens.

What makes large context windows so important is the ability to hold large amounts of information in memory within a single conversation. This is essential when analyzing non-text content, as images are worth 1,000 words and videos are worth even more. He was also trained by his older brother Gemini Pro1.5.

. Hassabis said this was done "through a process called distillation," with the most important knowledge and skills coming from larger models being smaller and more efficient models

"1.5Flash is used to create abstracts, chat applications, image and video captions, data extraction from long documents and tables, etc. "It is very good to have a good time," he said.

These models become even more important as they gain the ability to understand more as well as text with increased context windows to include faster but smaller ones like Flash.