What Is Speech Compression?
Speech compression (speech compression) is a method for compressing the encoded digital speech in order to improve the information transmission efficiency in the communication network and realize the efficient storage of speech.
- In order to improve the information transmission efficiency in the communication network and achieve the efficient storage of voice, it is also necessary to compress the encoded digital voice, that is, voice compression. For example: In mobile communications, the most important business is voice services.
- G-series standards of the International Telecommunication Union (ITU)
- The G.711 proposal is a 64Kb / s standard PCM (PulsecodeModulation) speech coding formulated by CCITT (the predecessor of the ITU) in 1972. It has been widely used in digital communications, digital switches and all digital voice interfaces. 64Kb / s standard PCM is a typical waveform encoding. G.721 recommendation is the 32Kb / s speech coding standard established by CCITT in 1984, also known as Adaptive Differential PCM (Adaptive Differential PCM). It not only has the same reproducible speech quality as PCM, but also has better anti-error performance than PCM. It has been widely used in satellite, submarine cable and digital speech insertion equipment and variable rate encoder.
- G.728 proposal is a 16Kb / s low-delay code-excited linear prediction LD-CELP (LowDelay-CodeExcitationLPC) speech coding scheme proposed by AT & T company and adopted by CCITT in 1995. The characteristic of LD-CELP is that the short-term spectrum and long-term spectrum prediction, gain factor prediction, and other parameters are not extracted from the input speech, but implemented by a 50-stage predictor in the backward direction. The transmitted information is only the excitation vector. This compresses the transmission bit rate. LD-CELP can be used in the fields of videophone audio, storage and forwarding system, digital mobile wireless system, digital plug-in equipment, voice information recording, packetized voice, etc.
- The G.729 proposal is a CS-ACELP (ConjugatedStructure-AlgebraCodeExcitationLinePrediction) coding standard for 8Kb / s conjugate structure coded digital excitation linear prediction adopted by the ITU in 1995. CS-ACELP is based on Code Excited Linear Prediction (CELP) coding model. The frame length is 10ms (80 samples). Through the analysis of the speech signal, the parameters of the CELP model (LPC parameters, adaptive and fixed codebook indicators and gain factors) are extracted. All these parameters are transmitted after encoding. At the decoder, these parameters are used to recover the excitation signal to reconstruct the speech signal. The short-term synthesis filter is a 10th-order linear prediction filter. Long-term or pitch integrated filtering is implemented using an adaptive codebook approximation method. Finally, a post-filter is used to enhance the reconstructed speech quality. 8Kb / s G.729 is mainly used in personal mobile communications, satellite communications, packet voice, and digital leased channels. G.723.1 is a coding standard proposed by the ITU in 1996.
- G.723.1 has two rates, 5.3Kb / s and 6.3Kb / s. Both rates are indispensable for encoding and decoding. The encoder uses linear prediction-analysis-synthesis coding to encode speech and audio signals. The excitation signal (6.3Kb / s) of the higher rate encoder uses multi-pulse maximum likelihood quantization MP-MLQ (MultiplePulse- MaximumLikelihoodQuantization); the lower rate (5.3Kb / s) uses algebraic digital excitation linear prediction ACELP. The frame length is 30ms, plus the Lookhead of 7.5ms, so the delay of the algorithm is 37.5ms. The two code rates make the system more flexible, and can switch directly between the two rates at the 30ms frame boundary according to the channel conditions. G.723.1 is mainly used for low-speed multimedia systems [1] .
- Speech compression standard in digital mobile communication
- Pan European Digital Cellular Mobile Communication (GSM) standard RPE / LEP. Regular Pulse Excitation / Long Time Prediction (RPE / LTP) is a 13Kb / s voice compression standard used in the European digital cellular mobile communication system GSM. It simulates the sound source with regular pulse sequences at equal intervals (every three samples), determines the pulse position and residual signal based on the amplitude value of the linear prediction residual signal, and uses feedback-type quantization including long-period prediction (LTP) The result is a pulse amplitude. The algorithm also belongs to ABS (Analysis-By-Synthesis) analysis-synthesis method.
- North American Digital Cellular Mobile Communications (ADC) standard VSELP. The Telecommunications Industry Association (CTIA) has adopted the standard IS-54 in North American digital cellular mobile communications. This standard speech encoder is called vector and excitation linear prediction VSELP, which is a form of CELP with a data rate of 8Kb / s. . In the IS-95 standard of North American CDMA, the voice coding used is QCELP proposed by Qualcomm.
- In the Japanese Digital Mobile Communication (JDC) standard, the speech coding used is also vector and excitation linear prediction, and the code rate is 6.7Kb / s.
- NSA's FS-1015 and FS-1016 standards