Up till now, most generative music fashions have been producing mono sound. This implies MusicGen doesn’t place any sounds or devices on the left or proper facet, leading to a much less energetic and thrilling combine. The rationale why stereo sound has been largely missed to this point is that producing stereo shouldn’t be a trivial activity.
As musicians, once we produce stereo alerts, we have now entry to the person instrument tracks in our combine and we are able to place them wherever we wish. MusicGen doesn’t generate all devices individually however as a substitute produces one mixed audio sign. With out entry to those instrument sources, creating stereo sound is difficult. Sadly, splitting an audio sign into its particular person sources is a troublesome drawback (I’ve printed a weblog put up about that) and the tech remains to be not 100% prepared.
Due to this fact, Meta determined to include stereo era immediately into the MusicGen mannequin. Utilizing a brand new dataset consisting of stereo music, they educated MusicGen to provide stereo outputs. The researchers declare that producing stereo has no extra computing prices in comparison with mono.
Though I really feel that the stereo process shouldn’t be very clearly described within the paper, my understanding it really works like this (Determine 3): MusicGen has realized to generate two compressed audio alerts (left and proper channel) as a substitute of 1 mono sign. These compressed alerts should then be decoded individually earlier than they’re mixed to construct the ultimate stereo output. The rationale this course of doesn’t take twice as lengthy is that MusicGen can now produce two compressed audio alerts at roughly the identical time it beforehand took for one sign.
With the ability to produce convincing stereo sound actually units MusicGen aside from different state-of-the-art fashions like MusicLM or Secure Audio. From my perspective, this “little” addition makes an enormous distinction within the liveliness of the generated music. Hear for yourselves (is likely to be onerous to listen to on smartphone audio system):
Mono
Stereo
MusicGen was spectacular from the day it was launched. Nevertheless, since then, Meta’s FAIR staff has been regularly bettering their product, enabling increased high quality outcomes that sound extra genuine. In the case of text-to-music fashions producing audio alerts (not MIDI and many others.), MusicGen is forward of its rivals from my perspective (as of November 2023).
Additional, since MusicGen and all its associated merchandise (EnCodec, AudioGen) are open-source, they represent an unbelievable supply of inspiration and a go-to framework for aspiring AI audio engineers. If we take a look at the enhancements MusicGen has made in solely 6 months, I can solely think about that 2024 might be an thrilling yr.
One other essential level is that with their clear method, Meta can be doing foundational work for builders who wish to combine this expertise into software program for musicians. Producing samples, brainstorming musical concepts, or altering the style of your present work — these are a number of the thrilling functions we’re already beginning to see. With a ample degree of transparency, we are able to make certain we’re constructing a future the place AI makes creating music extra thrilling as a substitute of being solely a risk to human musicianship.