What is Speech Synthesis?
Speech synthesis is a technology that produces artificial speech by mechanical and electronic methods. TTS technology (also known as text-to-speech technology) belongs to speech synthesis. It is a technology that converts text information generated by the computer itself or external input into readable and fluent spoken Chinese.
- Speech synthesis is using
- The research on speech synthesis technology has a history of more than 200 years, but the modern speech synthesis technology that is of practical significance has been developed with the development of computer technology and digital signal processing technology, mainly to allow computers to produce high definition , High natural continuous voice. In the development of speech synthesis technology, early research mainly used parametric synthesis methods. Later, with the development of computer technology, waveform synthesis methods appeared.
- The theoretical basis of speech synthesis is a mathematical model of speech generation. The speech generation process of this model is stimulated by the excitation signal, and the sound waves pass through the cavity (channel), and the sound waves are radiated from the mouth or nose. Therefore, the channel parameters and channel resonance characteristics have been the focus of research. Conventionally, the poles on the frequency response of a channel are called formants, and the distribution characteristics of the formant frequency (pole frequency) of a voice determine the tone color of the voice.
- Voices with different timbre have different formant modes. Therefore, each formant frequency and its bandwidth can be used as parameters to form a formant filter. Then a combination of several such filters is used to simulate the transmission characteristics (frequency response) of the channel, the signal from the excitation source is modulated, and then the synthesized speech can be obtained through the radiation model. This is the basic principle of formant synthesis. There are three practical models for the theory based on formants. [2]
- As a tonal language, Chinese prosody is very complicated. In ancient Chinese and modern Chinese Pinyin, the same syllable appears in different circumstances, and their prosody parameters are different. A limited memory unit is used to store basic Chinese basic phonetic units, and an infinite vocabulary is synthesized from the limited memory unit to form a continuous Chinese sentence. The prosody parameters of the sound bank unit must be adjusted under certain prosody rules to obtain a sound bank unit that conforms to the current locale. A speech synthesizer is used to perform this function.
- When the Chinese speech synthesis system is implemented under the DSP, in addition to clarity, intelligibility and naturalness, the synthesis algorithm is also required to have a lower computational complexity, and a speech library as small as possible to reduce the occupation of limited storage space. [2]