Microsoft has announced the latest version of its small language model Phi-35 This new version is a significant upgrade over the previous generation and outperforms the small models of major players such as Google, OpenAI, Mistral, and Meta in several key metrics
Phi-35 comes in 38 billion, 415 billion, and 419 billion parameter versions; all three are free to download and can be run using local tools such as Ollama
It performed particularly well in inference, losing only to GPT-4o-mini among the major small models It also performed well on math benchmarks, significantly outperforming Llama and Gemini small language models like Phi-35 demonstrate the efficiency gains of AI and lend credence to OpenAI CEO Sam Altman's goal of “creating intelligence that is so cheap it cannot be measured with a meter”
, lending credence to the idea that “the most efficient and cost-effective AI is the one that can be used to create intelligence that is unmetered [Phi-35 has a vision model version that can understand images as well as text, and a mixed version of the expert model that splits the learning task into different subnetworks for more efficient processing
The mixed expert model outperforms Gemini Flash 15 on multiple benchmarks; Gemini Flash 15 is the model used in the free version of the Gemini chatbot, with a large context window of 128k This is comparable to ChatGPT and Claude, although considerably smaller than Gemini itself
The main advantage of a very small model like the one I installed is that it can be bundled with applications or even installed in Internet of Things devices such as smart doorbells This would enable facial recognition without sending data to the cloud
The smallest model was trained on 34 trillion tokens of data over 10 days using 512 Nvidia H100 GPUs The mixed expert model consisted of 16 38b parameter models, used 49 trillion tokens, and took 23 days to train
I installed and ran a smaller 38-billion-parameter version of Phi-35 on my laptop, but it was not as impressive as the benchmark suggested; Phi-35's response was redundant, but often left me frustrated with its wording and some simple tests, it struggled
I asked the classic question: “Write a short one-sentence narrative in which the first letter of a word is the same as the last letter of the previous word” Even after clarification, it failed spectacularly
I have not tried a large mixture of expert models However, judging from the benchmarks, I have heard that some of the problems with the version of the model I tried are resolved Benchmarks suggest that its output will be of similar quality to OpenAI's GPT-4o-mini (the version that comes with the free version of ChatGPT)
One area where GPT-4o-mini seems to outperform others is in STEM and social sciences Its architecture allows it to maintain efficiency while managing complex AI tasks in different languages
Comments