What Is an Audio Compressor?

Audio compression technology refers to applying proper digital signal processing technology to the original digital audio signal stream (PCM encoding), reducing (compressing) its bit rate without losing the amount of useful information or the negligible loss introduced. It is called compression coding. It must have a corresponding inverse transform, called decompression or decoding. Audio signals may introduce a lot of noise and some distortion after passing through a codec system

Audio compression technology refers to applying proper digital signal processing technology to the original digital audio signal stream (PCM encoding), reducing (compressing) its bit rate without losing the amount of useful information or the negligible loss introduced. It is called compression coding. It must have a corresponding inverse transform, called decompression or decoding. Audio signals may introduce a lot of noise and some distortion after passing through a codec system

The emergence and early application of audio compression technology

Audio compression technology refers to applying appropriate digital signal processing technology to the original digital audio signal stream (PCM encoding), reducing (compressing) its bit rate without losing the amount of useful information, or introducing negligible losses. It is called compression coding. It must have a corresponding inverse transform, called decompression or decoding. Audio signals may introduce a lot of noise and distortion after passing through a codec system. The advantages of digital signals are obvious, but it also has its own corresponding disadvantages, namely, the increase in storage capacity requirements and the increase in channel capacity requirements during transmission. Taking CD as an example, its sampling rate is 44.1KHz, and the quantization accuracy is 16 bits. A 1-minute stereo audio signal needs about 10M bytes of storage capacity, that is, the capacity of a CD player is only about 1 hour. Of course, this problem is even more prominent in the field of much higher bandwidth digital video. Are all these bits required? The study found that there is a very large degree of redundancy in the direct storage and transmission of PCM code streams. In fact, the sound can be compressed at least 4: 1 under lossless conditions, that is, only 25% of the digital amount is used to retain all the information, and the compression ratio in the video field can even reach hundreds of times. Therefore, in order to make use of limited resources, compression technology has received extensive attention from the beginning. Research and application of audio compression technology has a long history. For example, A-law and u-law coding is a simple quasi-instantaneous companding technology, and it has been applied in ISDN voice transmission. The research on speech signals has developed earlier and is more mature, and has been widely used, such as adaptive differential PCM (ADPCM), linear prediction coding (LPC) and other technologies. In the field of broadcasting, audio compression technology is used in systems such as NICAM (Near Instantaneous Companded Audio Multiplex).

Redundant information for audio compressed audio signals

The digital audio compression coding compresses the audio data signal as much as possible on the premise that the signal does not cause distortion in the hearing sense. Digital audio compression coding is implemented by removing redundant components in the sound signal. The so-called redundant components refer to the signals in the audio that cannot be perceived by the human ear. They do not help to determine the sound color, tone, and other information of the sound. Redundant signals include audio signals outside the human ear's hearing range and audio signals that are masked out. For example, the frequency range of a sound signal that can be perceived by the human ear is 20 Hz to 20 KHz, and other frequencies other than the frequency that the human ear cannot detect can be regarded as redundant signals. In addition, according to the physiological and psychoacoustic phenomena of human ear hearing, when a strong signal and a weak signal coexist, the weak signal will be masked by the strong signal and cannot be heard, so the weak signal can be regarded as a redundant signal and No transmission. This is the masking effect of the human ear, which is mainly manifested in the spectrum masking effect and the time domain masking effect. They are introduced as follows:

Audio compression spectrum masking effect.

After the sound energy of a frequency is less than a certain threshold, the human ear cannot hear it. This threshold is called the minimum audible threshold. When another sound with greater energy appears, the threshold near the frequency of the sound will increase a lot, the so-called masking effect.
Merge Atlas (2 photos)

Audio compression time-domain masking effect.

When strong and weak signals appear at the same time, there is also a time-domain masking effect. That is, when the two occur very close to each other, a masking effect also occurs. Time-domain masking is divided into three parts: front masking, simultaneous masking and rear masking. Front masking refers to the short period of time before the human ear hears a strong signal, the existing weak signal will be masked and cannot be heard. Simultaneous masking means that when a strong signal and a weak signal exist at the same time, the weak signal will be masked by the strong signal and cannot be heard. After masking means that after the strong signal disappears, it takes a long time to hear the weak signal again. It is called after masking. These masked weak signals can be regarded as redundant signals.

Audio compression compression coding method

Audio signal coding is divided into waveform coding, parameter coding, and multiple technologies in the form of compression according to different compression principles.
(1) Waveform coding directly samples the time-domain or frequency-domain waveform of the audio signal at a certain rate, and then quantizes the amplitude samples in layers, transforms them into digital codes, and generates a reconstructed signal coding system from the waveform data. Value, the waveform is as consistent as possible with the original sound waveform, preserving detailed signal changes and various transition characteristics.
(2) Parameter coding First, establish a feature model based on different signal sources, such as speech signals, natural sounds, and so on. By extracting feature parameters and encoding processing, try to make the reconstructed sound signal maintain the semantic meaning of the original sound as high as possible, but reconstruct The waveform of the signal may be quite different from the waveform of the original sound signal. Commonly used characteristic parameters are formant, linear prediction coefficient, band division filter and other parameter coding technologies to achieve low-rate sound signal coding. The bit rate can be compressed to 2Kbit / s-4.8Kbit / s, but the sound quality can only reach Medium, especially low naturalness, only suitable for the transmission and expression of language.
(3) Hybrid coding The coding form that combines waveform coding and parameter coding overcomes the weakness of the original waveform coding and parameter coding, and strives to maintain the high quality of waveform coding and the low rate of parameter coding, at a rate of 4-16Kbit / s Can obtain high-quality synthetic sound signals. The basis of hybrid coding is linear prediction coding (LPC), commonly used pulse excitation linear prediction coding (MPLPC), planning pulse excitation linear prediction coding (KPELPC) codebook excitation linear prediction coding (CELPC) and other coding methods.

Audio compression compression method other division

In the field of audio compression, there are two compression methods, lossy compression and lossless compression! The MP3, WMA, and OGG that we commonly see are called lossy compression. As the name suggests, lossy compression reduces the audio sampling frequency and bit rate, and the output audio file will be smaller than the original file. Another type of audio compression is called lossless compression, which is what we are talking about. Lossless compression can reduce the size of the audio file on the premise of saving all the data of the original file. After the compressed audio file is restored, it can achieve the same size and the same bit rate as the source file. Lossless compression formats include APE, FLAC, WavPack, LPAC, WMALossless, AppleLossless, La, OptimFROG, Shorten, and common and mainstream lossless compression formats are only APE, FLAC. [1]

The main classification and typical representatives of audio compression algorithms

Generally speaking, audio compression technology can be divided into two categories: lossless compression and lossy compression. According to different compression schemes, it can be divided into time domain compression, transform compression, and subband compression. , And mixed compression of multiple technologies and so on. For various compression technologies, the complexity of the algorithm (including time complexity and space complexity), audio quality, algorithm efficiency (that is, compression ratio), and codec delay are very different. The applications of various compression technologies are also different.

Audio compression time domain compression (or waveform encoding) technology

The samples of the audio PCM stream are processed directly, and the stream is compressed by means of silence detection, non-linear quantization, and difference. The common characteristics of this type of compression technology are low algorithm complexity, average sound quality, small compression ratio (CD sound quality> 400kbps), and shortest encoding and decoding delay (relative to other technologies). This type of compression technology is generally used in speech compression and low bit rate applications (source signal bandwidth is small). Time-domain compression technologies mainly include G.711, ADPCM, LPC, CELP, and block companding technologies developed on these technologies, such as NICAM and sub-band ADPCM (SB-ADPCM) technology.

Audio compression subband compression technology

The subband coding theory was first proposed by Crochiere in 1976. The basic idea is to decompose the signal into the sum of components in several sub-bands, and then adopt different compression strategies for each sub-band component according to its different distribution characteristics to reduce the bit rate. The general subband compression technique and the transformation compression technique introduced below both determine the quantization order of subband samples or frequency domain samples by analyzing the signal spectrum based on the human's perception of the sound signal (psychoacoustic model). And other parameters, so it can also be called Perceptual compression coding. These two compression methods are much more complicated than the time-domain compression technology. At the same time, the coding efficiency and sound quality are greatly improved, and the coding delay is correspondingly increased. Generally speaking, the complexity of subband coding is slightly lower than that of transform coding, and the coding delay is relatively short.

MPEG-1 Standardization of audio compression technology and MPEG-1

Because digital audio compression technology has a wide range of applications and good market prospects, some research institutions and companies have spared no effort to develop their own patented technologies and products. The standardization of these audio compression technologies is very important. A great success in standardizing audio compression is MPEG-1 audio (ISO / IEC11172-3). In MPEG-1, three modes are specified for audio compression, namely layer I, layer II (that is, MUSICAM, also known as MP2), and layer III (also known as MP3). Since many compression techniques were carefully considered in the formulation of the standard, and the actual application conditions and the implementability (complexity) of the algorithm were fully taken into account, all three modes have been widely used. The audio compression scheme used in VCD is MPEG-1 layer I; and MUSICAM is widely used in the production, exchange, storage, and transmission of digital programs such as digital studios, DAB, and DVB due to its appropriate complexity and excellent sound quality. Application; MP3 is a hybrid compression technology based on the advantages of MUSICAM and ASPEC. Under the technical conditions at that time, the complexity of MP3 appeared to be relatively high, and the encoding was not conducive to real-time. The high level of sound quality makes it the darling of soft decompression and Internet radio. It can be said that the formulation of the MPEG-1 audio standard determines its success, and this idea even affects the formulation of the MPEG-2 and MPEG-4 audio standards that will be discussed later.

IN OTHER LANGUAGES

Was this article helpful? Thanks for the feedback Thanks for the feedback

How can we help? How can we help?