What Does an LPC Do?

Linear Predictive Coding (LPC) is a tool mainly used in audio signal processing and speech processing to represent the digital speech signal spectral envelope (en: spectral envelope) in a compressed form according to the information of the linear prediction model. It is one of the most effective speech analysis techniques and one of the most useful methods for high-quality speech at low bit-rate encoding methods. It can provide very accurate prediction of speech parameters.
Synonym LPC generally refers to linear predictive coding

Linear Predictive Coding (LPC) is a tool mainly used in audio signal processing and speech processing to represent the digital speech signal spectral envelope (en: spectral envelope) in a compressed form according to the information of the linear prediction model. It is one of the most effective speech analysis techniques and one of the most useful methods for high-quality speech at low bit-rate encoding methods. It can provide very accurate prediction of speech parameters.
Chinese name
Linear predictive coding
Foreign name
linear predictive coding
Time of origin
1966
Use
Transmission spectrum envelope information
Applied discipline
Communication

Overview of Linear Predictive Coding

The basis of linear predictive coding is the assumption that the sound signal (voiced) is generated by a buzzer at the end of the sound tube, occasionally accompanied by hissing and popping (tooth fricative and popping). Although this may seem a bit primitive, this model is actually very close to the real speech generation process. The glottis between the vocal cords produce sounds of different intensity (volume) and frequency (tone), and the throat and mouth form a resonance channel. Hissing and popping are produced by the action of the tongue, lips, and throat.
Linear predictive coding analyzes speech signals by estimating formants, eliminating their role in speech signals, and estimating the intensity and frequency of retained beeps. The process of removing formants is called inverse filtering, and the remaining signals after this process are called residual signals (en: residue).
The numbers describing the intensity and frequency of peaking, resonance peaks, and residual signals can be saved and sent elsewhere. Linear predictive coding synthesizes a speech signal through a reverse process: using the buzzer parameters and the residual signal to generate the source signal, and using the formants to generate a filter representing the channel, the source signal is processed by the filter to obtain the speech signal.
Because the speech signal changes over time, this process is processed on the speech signal frame. Intelligible signals are usually well compressed at 30 to 50 frames per second.
figure 1

Linear prediction coding principle

The output samples of a time-discrete linear system can be approximated by a linear combination of its input samples and past output samples, that is, linear prediction values. By minimizing the mean square value of the difference between the actual output value and the linear prediction value, a unique set of predictor coefficients can be determined. These coefficients are the weighting coefficients used in linear combinations. In this principle, the system is actually modeled. This model is the pole-zero model. There are two special cases: All-pole model, also known as autoregressive model. At this time, the predictor only makes predictions based on the output past samples. All-zero model, also known as moving average model. At this time, the predictor only makes predictions based on the input samples. By far the most commonly used model is the all-pole model. There are several reasons for this: It is the easiest to calculate for the all-pole model; the second is that it is impossible to know the input in most cases; the third is the speech signal. When the nasal and partial fricatives are not considered, the transmission function of the channel is a All-pole function.
There are two methods for estimating the model parameters under the all-pole model, namely the auto-relation method and the covariance method, which are applicable to stationary signals and non-stationary signals, respectively. The basic form of model parameters is linear prediction coefficients, but it has many equivalent representations. Different forms of coefficients differ in the resulting inverse filter structure, system stability, and number of bits required for quantization. The better form now recognized is the reflection coefficient. The filter corresponding to it has a lattice structure, and the number of bits required for quantization with good stability is also small.

Early history of linear predictive coding

According to Robert M. Gray of Stanford University, linear predictive coding originated in 1966 when S. Saito and F. Itakura of NTT described an automatic phoneme recognition method, which for the first time used speech coding Maximum likelihood estimation is achieved. In 1967, John Burg outlined the implementation of maximum entropy. In 1969, Itakura and Saito proposed the concept of partial correlation (en: partial correlation). May Glen Culler proposed real-time speech compression. BS Atal showed an LPC speech encoder at the annual meeting of the American Acoustics Association. In 1971 Philco-Ford demonstrated real-time LPC using 16-bit LPC hardware and sold four.
In 1972, Bob Kahn and Jim Forgie (en: Lincoln Laboratory, LL) and Dave Walden (BBN Technologies) of ARPA started the first development of voice packets, which finally brought Voice over IP technology. According to the Lincoln Laboratory's informal historical sources, Ed Hofstetter implemented the first real-time LPC of 2400 bits per second in 1973. In 1974, the first two-way real-time LPC voice packet communication was achieved between Culler-Harrison and Lincoln Laboratories via ARPANET at a speed of 3500 bits per second. In 1976, the first LPC conference was implemented through ARPANET using the Network Voice Protocol between Culler-Harrison, ISI, SRI, and LL at a speed of 3,500 bits per second. Finally in 1978, Vishwanath et al. Of BBN developed the first variable speed LPC algorithm.

Linear prediction coding linear prediction coding coefficient representation

Linear predictive coding is often used to transmit spectral envelope information so that it can tolerate transmission errors. Since direct transmission filter coefficients (see the definition of coefficients in linear prediction) are very sensitive to errors, one does not want to directly transmit filter coefficients. In other words, a small error does not distort the entire spectrum or degrade the entire spectrum quality, but a small error may make the prediction filter unstable.
There are many more advanced representation methods, such as log area ratio (LAR), line spectral pairs (LSP) decomposition, and reflection coefficient. Among these methods, LSP is widely used because it can guarantee the stability of the predictor and the spectral errors caused by small coefficient deviations are also local.

Linear predictive coding applications

Linear predictive coding is usually used for speech resynthesis. It is a sound compression format used by telephone companies, such as the GSM standard. It is also used as a format in secure wireless communications, where sound must be digitized, encrypted, and transmitted over a narrow voice channel.
Linear predictive coding synthesis can also be used to construct a sound synthesizer, and the musical instrument is used as the excitation signal of the time-varying filter obtained from the singer's voice prediction, which has become popular in electronic music.
A popular Speak & Spell educational toy from 1980 used a 10-step linear predictive coding.
Linear predictive coded predictors of order 0 to 4 are used in the FLAC audio codec. [1]

IN OTHER LANGUAGES

Was this article helpful? Thanks for the feedback Thanks for the feedback

How can we help? How can we help?