TechingToday

StabilityAI Releases Stable Audio 2.0 - All New Features Here

General

StabilityAI has announced the second version of its artificial intelligence music generation tool. It offers longer tracks, audio-to-audio support, and a greater commitment to copyright protection for its creators.

Stable Audio 2.0 allows users to input natural language processing prompts such as "beautiful piano arpeggio growing into a beautiful orchestral piece," "lo-fi funk," or "drum solo" to create a 3-minute track in 44.1 kHz stereo. AI-generated tracks include structured compositions such as intros, expansions, outros, stereo sound effects, etc.

Another new feature offered in Stable Audio 2.0 is the ability to upload audio files to the platform to generate "fully produced samples," evolving from a mere text-to-audio tool. For example, if you imitate the sound of drums with your voice, the app will prompt you to create an audio clip of the drum performance.

When using the new voice-to-audio feature, users must refrain from uploading copyrighted material in accordance with StabillityAI's Terms of Service StabillityAI uses content recognition technology to comply with this policy and to prevent copyright infringement. to prevent copyright infringement.

Like Stable Audio 1.0, the second model is trained on AudioSparx's vast library of audio files, consisting of 800,000 music tracks, sound effects, single instrument stems, and text-based metadata. AudioSparx musicians are encouraged to use their unhappy with their work being used to train AI models, but such musicians are given the opportunity to refuse training.

These copyright infringement and creator opt-out policy enhancements come on the heels of the recent departure of former VP of Audio Ed Newton-Rex. He announced his resignation in November 2023 in an X post that was heavily critical of the company's approach to protecting creators' rights.

"I have resigned from my role leading the audio team at StabilityAI because I disagree with the company that training generative AI models on copyrighted works is 'fair use,'" he wrote.

He concluded his post by urging tech companies to express their concerns to creators so that they "realize that exploiting creators is not a long-term solution in generative AI."

In addition to support for longer tracks and audio-to-audio, Stable Audio 2.0 has an enhanced architecture that facilitates "generation of complete tracks with a coherent structure." By adapting all components of the system, they claim to have achieved "improved performance over long time scales."

The tool features a new type of compressed autoencoder that creates shorter audio representations by compressing raw audio waveforms. Stable Diffusion 3 and similar diffusion transformers, on the other hand, can manipulate longer sequence data.

"Combining these two elements results in a model that can recognize and reproduce the large-scale structures that are essential for high-quality music," Stability AI wrote in a blog post.

The tool is free and ready to use.