How Do I Choose the Best Encoder?

Due to the drastic changes in the transmission environment, it is difficult for a mobile communication system to always work at the optimal source and channel coding rate. For example, in a traditional GSM system, the source and channel coding rates are fixed (full rate FR is 13 kbit / s, half rate HR is 5.6 kbit / s, and enhanced full rate EFR is 12.2 kbit / s). Quality has nothing to do. Under bad channel conditions, the number of redundant bits in the channel coding is not sufficient to correct transmission errors. At this time, the number of redundant bits in the channel coding should be increased and the number of source coded bits should be reduced to improve the quality of the speech.

Due to the drastic changes in the transmission environment, it is difficult for a mobile communication system to always work at the optimal source and channel coding rate. For example, in a traditional GSM system, the source and channel coding rates are fixed (full rate FR is 13 kbit / s, half rate HR is 5.6 kbit / s, and enhanced full rate EFR is 12.2 kbit / s). Quality has nothing to do. Under bad channel conditions, the number of redundant bits in the channel coding is not sufficient to correct transmission errors. At this time, the number of redundant bits in the channel coding should be increased and the number of source coded bits should be reduced to improve the quality of the speech.
Therefore, the UMTS system (WCDMA and TD-SCDMA) uses the adaptive multi-rate speech encoder for its voice coding. The basic idea is to jointly adjust the source and channel coding mode to adapt to the current channel conditions and services. The size, that is, the actual speech coding rate depends on the channel conditions and is a function of channel quality. The AMR encoder uses an adaptive algorithm to select the best speech coding rate.
Chinese name
Adaptive multi-rate speech coding
Foreign name
Adaptive Multi Rate
English abbreviation
AMR
Solid
Speech compression coding
AMR (Adaptive Multi Rate) is a speech compression coding developed by 3GPP and applied to the third-generation mobile communication W-CDMA system. It intelligently solves the problem of the rate allocation of the source and channel coding and makes the allocation of unlimited resources. And use more flexible and efficient. AMR supports eight rates: 12.2kb / s, 10.2kb / s, 7.95kb / s, 7.40kb / s, 6.70kb / s, 5.90kb / s, 5.15kb / s, and 4.75kb / s, and in addition, it Also includes low-rate (1.80kb / s) background noise coding mode.

Introduction to Adaptive Multi-Rate Speech Coding

Unlike the traditional FR, HR, and EFR speech coding rates in the GSM system, AMR provides a set of coding rates. After the AMR function is enabled, the voice encoding rate can be adjusted according to the quality of the wireless environment: when the wireless environment is good, the transmission bit error rate is low, increasing the voice encoding rate (using more characters for voice encoding), and obtaining high-quality voice; in the wireless environment When it is poor, the transmission error rate is high, and the speech encoding rate is reduced, and more bits can be allocated to the channel encoding to implement error correction, achieve more reliable error control, and improve speech quality.
The AMR speech encoder can encode and decode eight kinds of speech signals. It is based on algebraic digital excitation linear prediction (ACELP) encoding mode. The encoder input is 8KHz sampling, 16-bit quantized linear PCM encoding, and the encoding operation uses 20ms speech. For one frame, that is 160 samples. The transmitter encoder extracts the ACELP model parameters (linear prediction coefficients, adaptive and fixed codebook indexes and gains) for transmission, and the receiver decoder synthesizes the reconstructed speech signal based on the excitation signals formed by these parameters. The principle of AMR speech codec will be introduced in detail below.

Principle of adaptive multi-rate speech coding

The scheme adopted by AMR speech coding is algebraic codebook excited linear prediction (ACELP) technology, which is based on codebook excited linear prediction
AMR coding principle block diagram
(CELP) technology, AMR speech coding can be roughly divided into LPC analysis according to its implementation function! Pitch search! Algebraic digital book search three major parts "Among them, the main function completed by LPC analysis is to obtain 10 coefficients of a 10-order LPC filter, And convert them into line spectrum to parameter LSF and quantize LSF; pitch search includes two parts: open-loop pitch analysis and closed-loop pitch analysis, to obtain two parameters of pitch delay and pitch gain: the algebraic digital book search is In order to obtain the algebraic codebook index and algebraic codebook gain, the quantization of the codebook gain is also included. The signal flow of the AMR encoder is as shown in the right. The function of the AMR speech encoder includes nine parts: preprocessing, linear prediction analysis, and quantization. , Open-loop pitch analysis, calculation of impulse response, calculation of target response, adaptive codebook search and gain control, structure and search of algebraic codebook, quantization of adaptive codebook gain and fixed codebook gain, modification memory.

Principle of adaptive multi-rate speech encoding and decoding

Decoder functions include parameter decoding (LP coefficient adaptive codebook vector, adaptive codebook gain, fixed codebook increase
AMR decoding principle block diagram
(B)) and speech synthesis to obtain reconstructed speech. The reconstructed speech also passes a post filter and is scaled up. The signal flow of the AMR decoder is shown in the right figure

Key technologies used in adaptive multi-rate speech coding

In theory, AMR still belongs to code-excited linear prediction (CELP, Code Excited Linear Prediction) coding of variable rate speech compression coding. There has been new research on "change", and related advanced technologies have been introduced. These related technologies mainly include: Voice Activity Detector (VAD, Voice Activity Detector) technology used to detect the presence of voice during voice communication \ Adaptive technology for rate decision (RDA, RateDecisionAlgorithm) to highlight "change" words ECU (Error Concealment Units) technology to avoid negative effects after voice frame loss, and Comfort Noise Aspects (CNA) generation technology to overcome discontinuous background noise. The speech synthesis effect of variable rate speech coding using these technologies has hardly decreased. [1]

IN OTHER LANGUAGES

Was this article helpful? Thanks for the feedback Thanks for the feedback

How can we help? How can we help?