TechingToday

Meta wants you to run AI on your phone - here's what we know about MobileLLM

General

As reliance on AI increases, it is only a matter of time before people will need to access the latest chatbots through their cell phones

According to research scientists at Meta Reality Labs, as our reliance on large-scale language models (LLMs) grows, we may spend an hour or more each day in direct conversation with chatbots and in LLM processes running in the background, such as recommendation functions

We are not sure how much time we will be able to spend on LLM

All we usually see is ChatGPTs quickly responding to questions, but the energy consumption and carbon emissions required to make these responses happen "will cause tremendous environmental problems" if AI continues on its current trajectory, scientists said in a February 22 stated in a preprint paper

One solution would be to introduce these large-scale language models directly into cell phones, solving the portability and computational cost problems simultaneously

Indeed, while it is technically possible to run models like Meta's Llama 2 directly on an iPhone, the scientists calculated that the battery could only handle less than two hours of conversation This is not realistic for consumers Memory limitations would also increase latency to answer [What is needed is a compact LLM model designed for phones; Meta's team believes they have found the solution in what they call MobileLLM

Looking under the hood of the LLM, one of the main features that can be observed is the model size This is calculated by the number of parameters

The more parameters, the more complex and the more data can be processed; OpenAI's GPT-4 is considered the most powerful model in the field, with over a trillion parameters However, as mentioned above, running such a heavy model requires more energy and computing power

Meta researchers believe they can produce the highest quality LLM with less than a billion parameters (still 174 billion parameters less than GPT-3)

To achieve this, they found that overall performance could be enhanced by prioritizing depth features such as intellectual skill and advanced reasoning over breadth, or the ability to handle a wide range of tasks

They also found it useful to use grouped query attention when data storage is limited, such as on smartphones This is where different parts of a prompt are grouped together so that they can be processed in parallel Again, this works with less memory and energy

To validate the effectiveness of the subbillion-scale model in on-device applications, they evaluated performance on two important on-device tasks: chat and API calls [To evaluate the chat functionality, they used two major benchmarks, AlpacaEval and MT-Bench, and found that the MobileLLM model outperformed other state-of-the-art subbillion-scale models [On the other hand, an API call is the process by which one piece of software communicates with another software to perform a task other than its own programming For example, when you ask for a morning alarm, it is set in your phone's clock app and displays the following confirmation text: the alarm is set for 7:30 AM

At the end of the day, it is important to find the right balance Having an omniscient cell phone sounds great, but if it only lasts two hours before you start looking for a power outlet, it becomes less appealing

Apple is also actively addressing this issue

Apple is also actively working on this issue, as a future Siri with LLM will likely require significant processing on the device due to Apple's security requirements

As companies continue to add AI capabilities to their phones, Meta's research may provide answers to questions about where and how to find the right compromise for LLM