The digital data we use today is usually compressed, which lowers the cost of its storage and transfer. For instance, the lossy video compression used while watching movies makes it possible to reduce the original file size even one thousand times.
Data compression is a complicated process, which usually consists of two phases. During the first one some simplifying transformations, e.g. using repetitions of certain sequences of symbols in a file, are applied to data. In the second phase, the resultant set of symbols undergoes entropy encoding, which capitalises on the fact that frequently occurring sequences of symbols carry less information than the rare ones. And so, a symbol whose occurrence probability is 1/2 carries one bit of information, whereas the symbol whose occurrence probability is 1/8 carries as many as three bits. Typically, Huffman’s coding, which directly encodes symbols as sequences of bits (e.g. “a” as “010”) is used in such cases. While computationally inexpensive, it is not optimal for symbols the probability of which is not a power of 1/2. For example, although a symbol with 0.99 probability carries only 0.014 bit information, Huffman’s coding must use as least one bit to encode it. This suboptimality problem is mended by another type of coding – the arithmetic coding, which handles fractional bits by accumulating information in a kind of buffer. Yet, it is many times more expensive than Huffman’s coding.
Thanks to Dr Jarosław Duda from the Jagiellonian University Institute of Computer Science and Computer Mathematics, the compromise between the computational cost and the compression level is no longer necessary. During the years 2006-2014 Dr Duda introduced a new family of entropy encodings based on Asymmetric Numeral Systems (ANS). They generalise standard positional systems, such as the binary or decimal notation. This results in a type of coding which is similar to arithmetic coding in terms of the compression level and whose computational cost is close to that of Huffman’s coding.
Compressors using ANS started to be used in 2014 and in the middle of the following year Apple introduced LZFSE compressor, which became a default compression option for iPhone and Mac. In August 2016, Facebook demonstrated its open compressor Zstandard, which also uses ANS and may replace the commonly used gzip compressor due to a better speed and compression level. ANS were also applied in the beta version of a new video compressor developed by the Alliance for Open Media (including Adobe, Amazon, AMD, ARM, Cisco, Google, Intel, Microsoft, Mozilla, Netflix, and Nvidia). In the future it can be used for instance by YouTube. This type of coding has also been adopted by the most widespread DNA compression: CRAM 3.0, which is part of the popular genetic data processing package SAMtools.
More information available at: