What is a Lossy Compression?
Lossy compression is the use of human beings' insensitivity to certain frequency components in images or sound waves, allowing some information to be lost during compression; although the original data cannot be fully recovered, the impact of the lost part on understanding the original image is reduced , But in exchange for a much larger compression ratio.
- Lossy compression is the use of human beings' insensitivity to certain frequency components in images or sound waves, allowing some information to be lost during compression; although the original data cannot be fully recovered, the impact of the lost part on understanding the original image is reduced , But in exchange for a much larger compression ratio.
- Lossy compression is widely used in the compression of voice, image and video data.
Lossy compression overview
- Lossy compression, also known as destructive data compression in Taiwan, Hong Kong, and Macau.
- Common sound, image, and video compression are basically lossy.
- In multimedia applications, common compression methods are: PCM (pulse code modulation
- Lossy compression
- mp3, dlX, Xvid, jpeg, rm, rmvb, wma, wmv, etc. are all lossy compression.
- The lossy data compression method is a compression method in which the compressed and decompressed data is different from the original data but is very close. Lossy data compression, also known as destructive compression, compresses secondary information data, sacrificing some quality to reduce the amount of data, and increasing the compression ratio. This method is often used in the Internet, especially in the field of streaming media and telephone. It is often referred to as codec in this article. It is a compression method corresponding to lossless data compression. Depending on the design of the various formats, there is a generationloss for lossy data compression: both compressed and decompressed files will bring a gradual quality degradation.
- Defects caused by lossy compression that can be perceived by the human eye or human ear are called compression artifacts (en: compressionartifact).
Lossless compression
- Lossless compression is the compression of the file itself. As with the compression of other data files, it optimizes the data storage method of the file. It uses an algorithm to represent duplicate data information. The file can be completely restored without affecting the content of the file. In terms of images, there is no loss of image details.
- The basic principle is that the same color information only needs to be saved once. The software that compresses an image first determines which areas of the image are the same and which are different. Images that include repeated data (such as blue sky) can be compressed, and only the start and end points of the blue sky need to be recorded. But blue may have different depths, and the sky may sometimes be covered by trees, peaks, or other objects, and these need to be recorded separately. In essence, the lossless compression method can delete some duplicate data and greatly reduce the size of the image to be saved on the disk. However, the lossless compression method cannot reduce the memory consumption of the image, because when the image is read from the disk, the software will fill in the missing pixels with appropriate color information. If you want to reduce the amount of memory your image uses, you must use a lossy compression method.
- Lossy compression is characterized by maintaining gradual changes in color and removing sudden changes in color in the image.
Lossy compression type
Lossy compression lossy transform codec
- First sample the image or sound, cut into small blocks, transform to a new space, quantize, and then entropy encode the quantized value.
Lossy compression prediction codec
- Previous data and subsequent decoded data are used
- Lossy compression
- In some systems, these two techniques are used simultaneously. Transform codec is used to compress the error signal generated by the prediction step.
The advantages and disadvantages of lossy compression
- One advantage of the lossy method is that in some cases it is possible to obtain a much smaller file size than any known lossless method, while at the same time meeting the needs of the system. When users get a lossy compressed file, for example, to save download time, the decompressed file and the original file may look very different at the data bit level, but for most practical purposes, the human ear or the human eye cannot distinguish the two. The difference.
- Lossy methods are often used to compress sound, images, and video.
- Lossy video codecs almost always achieve much better compression ratios than audio or still images (compression ratio is the ratio of compressed files to uncompressed files).
- Audio can achieve a compression ratio of 10: 1 without noticeable quality degradation, and video can achieve a very large compression ratio such as 300: 1 with a slight observation of quality degradation.
- A lossy compressed image is characterized by maintaining a gradual change in color and deleting sudden changes in color in the image. Numerous experiments in biology have proven that the human brain uses the color closest to its neighbors to fill the missing color. For example, for a white cloud on a blue sky background, the lossy compression method is to delete some color parts at the edges of the scene in the image. When looking at this picture on the screen, the brain uses the colors seen on the scene to fill the missing color parts. With lossy compression technology, some data is intentionally deleted, and cancelled data is no longer recovered.
- Lossy still image compression often gets 1 / 10th the original size as audio does, but
- Lossy compression
- Some methods take into account the anatomy of the human body, for example, the human eye can only see light of a certain frequency. The psychoacoustic model describes how sound can achieve maximum compression without reducing the perceived quality of the sound.
Common formats for lossy compression
- MP3 (MP3PRO \ MP3SURROUND), AAC (* .3gp / *. Mp4 / *. M4a), ATRAC3 / ATRAC3 + (* .aa3).
- Let's first understand the principle of audio compression: use the psychoacoustic characteristics of human hearing (spectrum masking feature
- Lossy compression
Some basic concepts of lossy compression psychoacoustics
- 1.Isoloudness curve
- Human hearing sensitivity changes with frequency. That is, two tones with the same power but different frequencies do not sound the same. From the equal loudness curve, we can see that the human ear is most sensitive to the frequency of 4KHz, that is, the sound pressure level (loudness) that can be detected at 4KHz, cannot be detected at other frequencies. This provides the conditions for distortion at some less sensitive frequencies.
- 2.Shield
- We learned shielding in high school physics. It is the strong sound signal that covers the weak sound signal, which makes us undetectable. Moreover, when the two sounds are close in time and frequency, the shielding effect becomes strong. Therefore, we can not encode or transmit the masked part when encoding. In this way, there is still no major loss in sound quality, and it is not easy for the human ear to detect.
- 3.Critical frequency band
- For human hearing, the perceptual characteristics of sound do not change on a linear frequency scale (human hearing is not so good), but can be expressed by a limited series of frequency bands called critical bands. In simple terms, the entire frequency band is divided into several segments. In each of these frequency bands, the human ear's auditory perception is the same, that is, the psychoacoustic characteristics are the same.
- Closer to home, the essence of coding is algorithm.
Lossy compression mainstream encoding and its algorithm
- 1.MP3 (MP3PRO \ MP3SURROUND)
- MP3 should be considered as the most widely used lossy compressed digital audio format. Its full name is MPEG ( Moving Picture Experts Group ) AudioLayer-3. A lossy compressed digital audio format developed by the Fraunhofer Research Institute in Germany in 1987, and was patented in 1989. At first, it wasn't perfect, it was more like a coding standards framework and left to people to improve. In 1992, this technology was incorporated into the MPEG specification and became officially known as MP3.
- MP3 files are composed of frames, and frames are the smallest constituent unit of MP3 files. What is a frame? Remember how the original animation was done? Different continuous pictures are switched to achieve dynamic effects. Each picture is a "frame". The difference is that the frames in MP3 record audio data instead of graphic data. . The frame rate of MP3 is about 30 frames per second.
- Each frame is composed of a frame header and frame data. The frame header records the basic information of the frame, including the bit rate index and the sample rate index (this is important for understanding the ABR and VBR encoding methods). Frame data, as the name implies, records the main audio data.
- All of the above are the basics of MP3 encoding, but in fact, the early encoders were very imperfect, the compression algorithm was nearly crude, and the sound quality was very unsatisfactory. The sound quality of MP3 has two leaps: the introduction of human auditory psychology model ( PerceptualModel ) and the application of VBR technology.
- PS: VBR is the abbreviation of variablebitrate , which means variable ratio, that is, when MP3 files are compressed, there are more sound elements. When the ratio is higher, the compression bit rate will be automatically reduced, and the bit rate will be automatically increased when the bit rate demand is relatively low. The purpose of doing this is to increase the speed of the file when playing online, and to reduce the system resources occupied by the local player while ensuring that the sound quality is not substantially damaged. This is an algorithm developed by Xing. They will The complex part is encoded with high bitrate, and the simple part is encoded with low bitrate. Although the idea is good, unfortunately the VBR algorithm of the Xing encoder is very poor, and the sound quality is far from CBR. Fortunately, Lame perfectly optimized the VBR algorithm to make it the best encoding mode for MP3. This is a way of considering file size with quality as the prerequisite, and recommends the encoding mode.
- MP3 can survive to this day, and its development has not stopped. On June 14, 2001, Thomson of France and RCA of the United States jointly launched a new compression format: MP3PRO. MP3PRO is an improvement based on MP3 technology. It uses the codec enhancement technology developed by Coding Technologies, which is called SBR ( SpectralBandReplication ). When making MP3PRO files, the encoder divides the audio into two parts. One part is to separate the low frequency part of the audio data.
- Lossy compression
- PSP supports MP3PRO, and there are many format conversion software that support MP3PRO. You can find it online. If you are interested, you can try it, it is definitely better than mp3.
- Thomson officially announced in early December 2004 that the world's most popular music compression format MP3 is entering the multi-channel era. MP3SURROUND was jointly developed by FraunhoferIIS and Agere. It uses binauralCueCoding (BCC) technology for psychoacoustic encoding, which can ensure multi-channel surround while ensuring file size. At the same time, AgereSystems company is mainly responsible for promoting the multi-channel MP3 format-MP3SURROUND. MP3SURROUND technology realizes 5.1-channel surround high-quality audio with a wide range of applications. It can play a role in network music distribution, broadcasting systems, PC audiovisual applications, game sound effects, consumer electronics and car audio. Although multiple channels are integrated, Thomson said that MP3SURROUND files have not increased much compared to ordinary MP3 (same sample rate), and only half of them compared to other surround multi-channel audio formats. More importantly, MP3SURROUND provides good compatibility and can be used normally on existing MP3 software and MP3 players.
- 2. AAC (* .3gp / *. Mp4 / *. M4a)
- AAC is the abbreviation of Advanced Audio Coding, which was jointly developed by Fraunhofer Research Institute, Dolby and AT & T. AAC is part of the MPEG-2 specification, and it is suitable for encoding in the ultra-high-quality audio range from 8Kbps mono phone sound quality to 160Kbps multi-channel. Compared with MP3, AAC adds features that are not available in MP3 audio formats such as perfect reproduction of stereo sound, stream effect sound scanning, multimedia control, noise reduction optimization, etc., making it possible to perfectly reproduce CD sound quality after audio compression. It also supports up to 48 audio tracks, 15 low-frequency audio tracks, more sample rates and bit rates, compatibility with multiple languages, and higher decoding efficiency. In short, AAC can provide better sound quality while being 30% smaller than MP3 files.
- Some of these modules are explained below:
- Gaincontrol
- The gain control module is used in a variable sampling rate configuration. It consists of a polyphase quadrature filter (PQF), a gain detector, and a gain modifier. This module separates the input signal into four equal bandwidth bands. There is also a gain control module in the decoder, which obtains a low sampling rate output signal by ignoring the high subband signal of the PQF.
- Filter Bank
- The filter bank is a conversion module that transforms the input signal from the time domain to the frequency domain. It is the basic module of the MPEG-2AAC system. This module uses the improved discrete cosine transform MDCT, which is a linear orthogonal overlapping transform and uses a technique called time domain aliasing cancellation TDAC (time domain aliasing cancellation). The MDCT uses a KBD ( Kaiser-Besselderived ) window or a sine window. The forward MDCT transform can be expressed using the following formula:
- The inverse MDCT transform can be expressed using the following formula:
- among them,
- n = sample number,
- N = transform block length,
- i = block number,
- The above two discrete cosine transformation formulas are introduced in detail in Discrete Functions and Mathematical Equations, just to help interested players to understand, do not need to dig deeper.
- Instantaneous noise shaping TNS
- In perceptual sound coding, the TNS module is a method used to control the instantaneous shape of quantization noise, which solves the problem of mismatching of masking threshold and quantization noise. The basic idea of this technique is that the tonal acoustic signal in the time domain has an instantaneous spike in the frequency domain. TNS uses this duality to extend known predictive coding techniques and place quantization noise below the actual signal. To avoid mismatches.
- Joint stereo coding
- Joint stereo coding (joint stereo coding) is a spatial coding technology, its purpose is to remove redundant information in space. The MPEG-2 AAC system includes two spatial coding technologies: M / S coding ( Mid / Sideencoding ) and sound intensity / coupling ( Intensity / Coupling ). M / S encoding uses matrix operations, so M / S encoding is called matrixedstereocoding. M / S coding does not transmit left and right channel signals, but uses the nominalized "sum" signal and "difference" signal, the former being used for the central M (middl
- Lossy compression
- Prediction
- This is a technique commonly used in speech coding systems, which is mainly used to reduce the redundancy of stationary signals.
- Quantizer
- A non-uniform quantizer is used.
- Noiseless coding
- Noiseless coding is actually Huffman coding, which encodes the quantized spectral coefficients, scale factors, and direction information.
- PS: I personally like AAC, so I write in more detail, and everyone may try it, it is definitely better than MP3. You can use iTunes6 to convert AAC (*. M4a). The operation of iTunes6AAC is very simple, you can directly copy AAC (* .3gp \ *. mp4 \ *. m4a) to [MUSIC] and broadcast it.
- It can be said that aac is currently the best lossy compression method.
- The highest quality general comparison is non-destructive (in the naked eye).
- 3. ATRAC3 / ATRAC3 + (* .aa3)
- My friends who played MD in the early years all know that Sony's ATRAC audio format algorithm tailored for MD is widely used in SONY's NetworkWalkman and other portable audio devices. "ATRAC3plus" stands for "Adaptive Transforming Audio Coding 3+", which is a set of audio compression technology based on psychoacoustic principles. It developed from the ATRAC3 format. This technology became perfect in 2002. This technology is the theoretical basis for reducing the size of the MD Walkman to a very small size. [1]
- To analyze ATRAC3 / ATRAC3 +, we must first talk about its big brother-ATRAC algorithm. When digital audio data is compressed, it usually brings a certain amount of quantization noise into the signal. In order to prevent these signals from being perceived by the human ear, the usual approach is to decompose the signal into a set of units, each of which corresponds to a specific time-frequency range. The encoder will analyze according to the psychoacoustic principles mentioned above, and encode the important units with high precision. The insensitive units can retain some quantized noise without affecting the perceived quality of the human ear. When decoding, the quantized spectrum is re-established based on the bit allocation, and then the audio signal is synthesized.
- ATRAC is no exception, but there are some improvements. ATRAC also applies sub-band decoding and conversion decoding technology, and the input signal is assigned an uneven frequency division that emphasizes important low-frequency regions. In addition, ATRAC uses a variable block length to change the input signal, which can ensure efficient decoding during stable passing, and will not affect the time resolution when passing instantaneously. Specifically, the input signal is divided into three frequency bands at 5.5125KHz and 11.025KHz. The subband decomposition is done using QMF ( QuadratureMirrorFilters ); these 3 bands are indexed by MDCT ( ModifiedDiscreteCosineTransform) similar to the usual fast Fourier transform, Advanced Mathematics II, and Mathematical Equations Related introduction.) Converted to spectral values, MDCT allows up to 50% overlap between blocks, making it possible to improve frequency resolution while maintaining critical sampling. The length of the block can be changed according to the type of signal. This is the adaptive part of ATRAC (this method is mainly used to mask the initial quantization noise with shielding).
- When the ATRAC algorithm has been developed for 10 years, it has not been able to meet the needs of the market. SONY introduced a new algorithm in August 2002
- ATRAC3 / ATRAC3 +. Its core algorithm has no substantial changes compared to ATRAC, but only uses improved band separation filtering and MDCT, and uses techniques such as gain adjustment, pitch component separation, and joint stereo (Joint-Stereo) to further reduce the volume of audio compression data.
- 4. AAL (ATRACAdvancedLossless)
- AAL is the abbreviation of ATRACAdvancedLossless. It is a new audio compression format developed by SONY. It features lossless compression without loss of audio information. A CD can be compressed to 30%-80% .
- 5.Ogg
- Ogg's full name should be OGG Vobis (ogg Vorbis) is a new audio compression format, similar to existing music formats such as MP3. But one difference is that it is completely free, open and patent-free. OGG Vobis has a very outstanding feature is that it supports multi-channel. With its popularity, listening to DTS-encoded multi-channel works in the future will not be a dream.
- Vorbis is the name of this audio compression mechanism, and Ogg is the name of a project intended to design a completely open multimedia system.
- Ogg Vorbis files have the extension .OGG. The design format of this file is very advanced. The created OGG file can be played on any player, so this file format can be continuously improved in size and sound quality without affecting the old encoder or player.
- Compared to aac, it has a slight advantage in low frequency, and inferior to aac in high frequency.
- The highest quality general comparison is non-destructive (in the naked eye).
- The highest quality, Q10, is almost twice the size of the highest quality Q500 that AAC uses faac coding.
- Coding is open source.