Apple Takes on Meta with New Open Source AI Model - Here's Why It Matters

Apple Takes on Meta with New Open Source AI Model - Here's Why It Matters

Apple is becoming one of the surprise leaders in the open source artificial intelligence movement.

This new model, built by Apple's research department, is unlikely to become part of Apple products any more than the lessons learned during training. However, it is part of the iPhone maker's commitment to building a broader AI ecosystem, including open data initiatives.

This is the latest release in the DCLM model family, which has outperformed the Mistral-7B in benchmarks and is closing in on similarly sized models from Meta and Google.

Vaishaal Shanker of Apple's ML team writes in X that these are the “best performing truly open source models” currently available. What he means by truly open source is that all weights, training code, and datasets are publicly available along with the models.

This comes the same week that Meta is expected to announce its giant GPT-4 competitor, the Llama 3 400B. It is unclear whether Apple plans to release a larger DCLM model in the future. [Apple's DCML (dataComp for Language Models) project involves researchers from Apple, the University of Washington, Tel Aviv University, and the Toyota Research Institute. The goal is to design high-quality datasets for training models.

This is an important move given recent concerns about the data used to train some models and whether all the content in the dataset is properly licensed or approved for training AI.

The team is experimenting with different model architectures, training codes, evaluations, and frameworks to explore which data strategies are most effective in creating models that perform well and are highly efficient.

This work resulted in DCML-Baseline, which was used to train new models with 7 billion and 1.4 billion parameter versions.

The model is very efficient and completely open source: the 7B model performs as well as other models of the same size, but with far fewer tokens of content to be trained on.

The model cannot be used for large text summarization due to its fairly small context window of 2,000 tokens, but it has an accuracy of 63.7% or 5 shots on standard evaluation benchmarks.

Despite its small size and tiny context window, the fact that all weights, training data, and processes are open-sourced makes this one of the most important AI releases of the year.

It will also make it easier for researchers and companies to create their own small AI that can be integrated into research programs and apps at no cost per token. [25] [26] Sam Altman, CEO of Open AI, said of the tiny GPT-4o mini released last week, “The goal is to create intelligence that is so cheap it cannot be metered.”

Categories