Apple Announces New Ferret-UI LLM - This AI Can Read iPhone Screens

Apple Announces New Ferret-UI LLM - This AI Can Read iPhone Screens

Apple researchers have created an AI model that can understand what is happening on a cell phone screen. This is the latest in a growing line of models.

Called Ferret-UI, this multimodal large-scale language model (MLLM) can perform a wide variety of tasks based on what you can see on your phone's screen. Apple's new model can, for example, identify the type of icon, look for specific text, or tell you exactly what to do to accomplish a particular task.

These capabilities are documented in a recently published paper that details how this particular MLLM was designed to understand and interact with mobile user interface (UI) screens.

What is not yet known is whether this will be part of the rumored Siri 2.0 or just an Apple AI research project that will remain a paper publication.

We currently use our cell phones to accomplish a variety of tasks, such as looking up information and making appointments. To do so, we look at our phones and tap a button that leads us to our goal.

Apple believes that automating this process would make interacting with cell phones even easier. They also hope that models like Ferret-UI will help with accessibility, app testing, usability testing, etc.

For such a model to be useful, Apple needed to be able to understand everything that is happening on the phone's screen while being able to focus on specific UI elements. Overall, Apple also needed to be able to match the instructions given in normal language with what is being displayed on the screen.

For example, Ferret-UI was shown a picture of AirPods in an Apple store and asked how to purchase them; Ferret-UI correctly replied that he should tap the "Buy" button.

With most of us carrying smartphones in our pockets, it makes sense that companies are looking at ways to add AI features tailored to these smaller devices.

Research scientists at Meta Reality Labs have already predicted that we will spend an hour or more each day in direct conversation with chatbots or having LLM processes run in the background to power features such as recommendations

Meta

Meta's chief AI scientist, Yang Le Kung, has even said that in the future, AI assistants will mediate our entire digital diet.

So while Apple did not specify what exactly its plans for Ferret-UI are, it is not that difficult to imagine how such a model could be used to supercharge Siri and make the iPhone experience pleasant, perhaps even within the year. It's not hard to see.

Categories