Microsoft's new tiny language model can read images — this is what you can use it for

Microsoft's new tiny language model can read images — this is what you can use it for

During Build2024, Microsoft announced a new version of the company's small language AI model Phi-3.It can analyze the image and tell the user what is in it. The new version of Phi-3-vision is a multimodal model.

Especially with the update to OpenAI's GPT-4o and Google's Gemini, the multimodal model means that AI tools can read text and images. 

Phi-3-vision is intended for use on mobile devices because it has a 4.2 billion parameter model. The parameters of an AI model are shorthand for understanding how complex a model is and how well trained it can be understood. Microsoft has iterated over the Phi model in previous versions. So, for example, Phi-2 learns from Phi-1 and grows with new features, while Phi-3 is similar to Phi-2, trained on Phi-2 and with added features.

Phi-3-vision can perform common visual inference tasks, such as analyzing charts and images. Unlike other well-known models like OpenAI's DALL-E, Phi-3-vision can only "read" images and not generate them. 

Microsoft has released some of these small AI models. It's designed to run locally and on a wider range of devices than larger models such as Google's Gemini and ChatGPT. No internet connection is required. It also reduces the computational power required to perform certain tasks, such as solving math problems, like Microsoft's small Orca-Math model.

The first iteration of Phi-3 was announced on May 4, when Microsoft released the tiny Phi-3-mini. In the benchmark test, it worked very well for larger models like Llama2 in Meta. The mini-model has only 3.8 billion parameters. There are also two other models, Phi-3-small and Phi-3-medium, which have 70 billion parameters and 140 billion parameters, respectively. 

Phi-3-vision is currently available in preview. The other 3 Phi-3 models, Phi-3-mini, Phi-3-small, and Phi-3-medium are accessible from the Azure Machine Learning Model catalog and collection. To use these, you need a paid Azure account and Azure AI Studio hub. 

Categories