Apple Takes on Meta with New Open Source AI Model - Here's Why It Matters

Apple Takes on Meta with New Open Source AI Model - Here's Why It Matters

Apple is becoming one of the surprise leaders in the open source artificial intelligence movement

This new model, built by Apple's research department, is unlikely to become part of Apple products any more than the lessons learned during training However, it is part of the iPhone maker's commitment to building a broader AI ecosystem, including open data initiatives

This is the latest release in the DCLM model family, which has outperformed the Mistral-7B in benchmarks and is closing in on similarly sized models from Meta and Google

Vaishaal Shanker of Apple's ML team writes in X that these are the “best performing truly open source models” currently available What he means by truly open source is that all weights, training code, and datasets are publicly available along with the models

This comes the same week that Meta is expected to announce its giant GPT-4 competitor, the Llama 3 400B It is unclear whether Apple plans to release a larger DCLM model in the future [Apple's DCML (dataComp for Language Models) project involves researchers from Apple, the University of Washington, Tel Aviv University, and the Toyota Research Institute The goal is to design high-quality datasets for training models

This is an important move given recent concerns about the data used to train some models and whether all the content in the dataset is properly licensed or approved for training AI

The team is experimenting with different model architectures, training codes, evaluations, and frameworks to explore which data strategies are most effective in creating models that perform well and are highly efficient

This work resulted in DCML-Baseline, which was used to train new models with 7 billion and 14 billion parameter versions

The model is very efficient and completely open source: the 7B model performs as well as other models of the same size, but with far fewer tokens of content to be trained on

The model cannot be used for large text summarization due to its fairly small context window of 2,000 tokens, but it has an accuracy of 637% or 5 shots on standard evaluation benchmarks

Despite its small size and tiny context window, the fact that all weights, training data, and processes are open-sourced makes this one of the most important AI releases of the year

It will also make it easier for researchers and companies to create their own small AI that can be integrated into research programs and apps at no cost per token [25] [26] Sam Altman, CEO of Open AI, said of the tiny GPT-4o mini released last week, “The goal is to create intelligence that is so cheap it cannot be metered”

Categories