If the digital sector is not the most polluting or the most generator of greenhouse gases, the impact it has on natural ecosystems and global warming continues to grow, which makes it an area of attention, especially since it interferes everywhere in our daily lives. Therefore, any new technology capable of lessening the effects attracts the interest of ecologists. This is the case of an audio compression method tested in the FAIR (Fundamental AI Research) laboratory of a certain Meta.
Unsurprisingly, it was with the metaverse in sight that a small research team embarked on AI-assisted compression, emphasizing the importance of making available to all connections — even limited — the new eldorado of the parent company of the Facebook galaxy. To begin her work, she focused on audio compression, this field revolutionized by the MP3 of the Fraunhofer Institute in the 90s, which allowed the appearance of Napster and similar, hatched the market for digital walkmans and makes streaming platforms possible today. Except that with EnCodec – name given to their audio codec – it is no longer a question of compression, but of hypercompression, as the researchers themselves admit.
In this area, encoding and decoding are the key steps. They manage compression and decompression of the file. Here, a third phase intervenes with the “quantizer”, an algorithm capable of deconstructing and reconstructing an audio signal while respecting a given final file size and retaining the most important information. An additional complexity that makes decompression very difficult. That’s why it uses AI.
The idea here is not to keep a signal as detailed as the original. At this level of compression, it is unthinkable. On the other hand, the role of this intelligent decoding is to prohibit the changes which would be “perceivable by the human ear”, by making use of discriminators which will compare extracts of the original file to the compressed results. This is the role of a neural network trained for this purpose. In this way, the system is forced to offer reconstructed extracts perceived as similar to the original.
A technique that allows Meta researchers to claim to reduce the size of an audio file by ten times compared to a 64 kbps MP3, and this “without loss of quality”. Enough to hold an album on a simple antediluvian computer diskette. An assertion that we do not take at face value, of course. Especially since these works seem to turn more towards the encoding of vocal messages, which are easier to understand than a more complex piece of music.
Nevertheless, the Facebook laboratory strongly believes in this work, which it considers to be the first of its kind to be able to apply to CD quality (16-bit 48 kHz stereo), which is the standard for music distribution. Eventually, the company hopes that its compression method can be used for videoconferencing as well as for streaming movies, or playing virtual reality games online with friends.
Examples to listen to
Obviously, there is the question of the hardware resources needed to decode these files. A point on which Meta is not very verbose, but reassuring, indicating that the decompression is done at high speed in real time on a single CPU core. The researchers also indicate that future progress in processors dedicated to these tasks could improve the file compression/decompression phases by being less energy-consuming.
In the blog post presenting these advances, an example of compression at 6 kbps can be listened to, making it possible to compare an original extract with that obtained with EnCodec. Without a doubt, the result obtained is indefinitely more audible than that given by the use of EVS in 2014 or OPUS in 2020, with the same compression objective. From there to say that no difference is audible with the original file, it is perhaps going a little too quickly.
To get an idea, other examples are offered on Twitter by one of Meta’s researchers, Alexandre Défossez, with bandwidths ranging from 1.9 to 10.4 kbps. He also points out that the current EnCodec model, at only 12 kbps, gives results measured by human samples somewhere between MP3 at 64 and 128 kbps. A file size reduction factor of around six to ten times, which is impressive.
Music streaming, not so good for the planet
It doesn’t take much for some to see a compression method representing a major breakthrough for the music industry, imagining that it could greatly reduce the environmental footprint of streaming by improving the efficiency of the distribution of songs. And this even though music streaming is, in itself, presented as an ecological boon in the sense that its carbon footprint is lower than that of any physical broadcasting medium.
Except that this point is largely to be tempered given the fact that this technology is also accompanied by a massive increase in music consumption. According to the organization Zero Carbon, in 2020, with the Covid-19 pandemic, the use of streaming services increased by 70%, to represent 570 million tonnes of CO2 equivalents released into the atmosphere. As for Sharon George of Keele University, she calculated that 5 hours of audio streaming represented the 288 g of CO2e of a CD with its box, and that 17 hours was equivalent to the 979 g of a vinyl record. However, several studies estimate the consumption of streaming music by subscribers to a premium offer at around 5 hours on average.
Still, better compression will also have the advantage of putting a whole host of technologies within reach of slow connections, while reducing the discomfort of use in a disturbed network environment. That said, when you know the quality of an MP3, at 320 kbps but also at lower qualities, and you know the number of detractors that this format still has among audiophiles who swear by lossless, there’s no need to to say that the specialized forums are far from having finished with the format war.
The details of the post about EnCodec and a link to his code are also available.