Meta has publicly released a new AI model family called Chameleon that rivals more commercial tools like Gemini Pro and GPT-4V
It originally detailed the nuts and bolts of all models in a paper showing that Chameleon, which comes with 70 billion and 340 billion parameter versions, can understand and generate images and text
Chameleon can also handle text and image combinations (which may be related to each other) and generate meaningful responses, Meta says
That is, you can take a picture of the contents of the refrigerator and ask what you can cook with just the ingredients you have This is not possible with the Lama generation of Ai models, bringing open source closer to the mainstream vision model of OpenAI and Google's higher profile
After the publication of the paper, Meta's Fundamental AI Research (FAIR) team published the model for research purposes, with some limitations
The authors of this paper state that the key to Chameleon's success is a fully token-based architecture The model learns to infer images and text together, but this is not possible for models that use separate encoders for each input
Technical challenges that the Meta team had to overcome included optimization stability and scaling It did so using new methods and training techniques
Ultimately, for users, it means that Chameleon needs to be able to easily handle prompts for output in both text and images
Users can ask a chameleon to create an itinerary to experience the solstice, for example, and the AI model associated with the text it generates
Researchers found that, according to human assessments, a chameleon contains a mixed sequence of both images and text in prompts and outputs He said that if the performance is consistent with or exceeds the performance of models like Gemini Pro and GPT-4V However, ratings for infographics and chart interpretation have been excluded
The published model meta can only produce text output, and its safety level has been deliberately increased
However, on May 5, Armen Aghajanyan, 5 of those who worked on the project, wrote to x that their model "completed training 5 months ago" and claimed that "it has made significant progress since then"
For researchers, Chameleon is a source of inspiration for another way to train and design AI models For the rest of us, it means that we are one step closer to having an AI assistant that can better understand the context in which they are operating, without having to use any of the closed platforms
Comments