TechingToday

Apple unveils new AI model "ReALM" - potentially faster and smarter than Siri

General

Apple has introduced a new little language model called ReALM (Reference Resolution As Language Modeling). It runs on cell phones and is designed to make voice assistants like Siri smarter by allowing them to understand context and ambiguous references.

This was announced ahead of the launch of iOS 18 at WWDC 2024 in June, which is expected to be the big push behind the new Siri 2.0, but it is not clear if this model will be integrated into Siri in time.

This is not Apple's first foray into the artificial intelligence arena in recent months, with a mix of new models, tools to increase the efficiency of AI in smaller devices, and partnerships, all painting a picture of a company ready to make AI central to its business.

ReALM is the latest announcement from Apple's rapidly growing AI research team, and the first to focus specifically on improving existing models to make them faster, smarter, and more efficient. The company claims to outperform OpenAI's GPT-4 in certain tasks.

Details were announced in a new Apple open research paper released Friday and first reported by Venture Beat on Monday. Apple has yet to comment on the research or whether it will actually be part of iOS 18.

Apple now seems to be taking a "throw everything at it and see what sticks" approach to AI. There are rumors of partnerships with Google, Baidu, and even OpenAI. The company has announced impressive models and tools to make AI easier to run locally.

The iPhone maker has been working on AI research for over a decade, but much of it has been hidden in its apps and services. It wasn't until the release of the latest MacBook models that Apple began using the word AI in its marketing, and that will only increase in the future.

Much of the research is focused on ways to run AI models locally without resorting to sending large amounts of data to be processed in the cloud. This is essential not only to reduce the cost of running AI applications, but also to meet Apple's strict privacy requirements.

ReALM is smaller than models like GPT-4 because it does not have to do everything; ReALM's purpose is to provide context for other AI models like Siri.

This is a visual model that reconstructs the screen and labels each entity and its location on the screen. This creates a text-based representation of the visual layout that can be passed to the voice assistant, providing context clues to user requests.

In terms of accuracy, Apple states that ReALM is comparable to GPT-4 on many key metrics, despite being smaller and faster.

"We would especially like to emphasize the advantages obtained with the on-screen data set, which is a much more accurate and accurate data set than the GPT-4. We find that our model with the text encoding approach performs nearly as well as GPT-4, even though the latter is provided by screenshots," the authors write.

What this means is that when a future version of ReALM is introduced to Siri, or even this version, Siri will be able to better understand the meaning of the word when the user says "open this app" or "tell me the meaning of this word in the image." understanding of the meaning of the word.

It also gives Siri more conversational capabilities without having to fully deploy a large language model like Gemini.

When coupled with other recent Apple research papers that allow for "one-shot" responses, where AI can get an answer from a single prompt, it is a sign that Apple is still investing heavily in the AI assistant field as well as relying on outside models.