What Is Speech Analytics?

Speech analysis (SpeechAnalytics) technology refers to the conversion of unstructured speech information into a structured index through core technologies such as speech recognition to achieve knowledge mining and rapid retrieval of massive recording files and audio files.

Speech analysis (SpeechAnalytics) technology refers to the conversion of unstructured speech information into a structured index through core technologies such as speech recognition to achieve knowledge mining and rapid retrieval of massive recording files and audio files.
Chinese name
Speech analysis
Foreign name
SpeechAnalytics
Core functions
Semantic parsing
Value
Xunfei language
Applied discipline
Communication

Speech analysis definition

Speech analysis (SpeechAnalytics) technology refers to the conversion of unstructured speech information into a structured index through core technologies such as speech recognition to achieve knowledge mining and rapid retrieval of massive recording files and audio files.
The call center maintains a large amount of customer service recording data. These voices contain a large amount of valuable information such as customer needs, complaints, satisfaction, suggestions, and competitive intelligence. However, due to the huge number and inconvenient retrieval, it is currently generally only used for Quality inspection.
HKUST News VoiceInsight voice analysis system, through the leading voice analysis core technology, can effectively analyze the recorded data according to the actual business needs of the customer service center, and extract effective information, allowing users to control massive customer service recording data to assist customer service Quality inspections can further improve the quality of customer service and customer satisfaction. At the same time, user behavior data can be mined through the system to make accurate market decisions in a timely manner.
The unique parameters commonly used in speech analysis are: formant amplitude and frequency, which are several areas of energy concentration in the short-term power spectrum of speech. The center frequency of the area is called the formant frequency. Generally, there are three to five formants in speech. The amplitude of these formant frequency components is called the formant amplitude. Sometimes the bandwidth at which the formant amplitude drops from the center to 3dB is called the formant bandwidth. The formant parameters completely determine the properties of the vowels in the pronunciation.
The speech parameters obtained by the time domain method and the linear prediction calculation method are called linear prediction parameters. Linear prediction parameters are the time-domain analysis parameters of speech. It can accurately obtain the transmission characteristics of the sound channel. The formant parameters can be obtained by determining the relationship between the time-domain and frequency-domain parameters. Using the linear prediction parameters, another set of parameters can be obtained, which are called reflection coefficients. The reflection coefficients have better numerical stability than the linear prediction parameters. From the linear prediction parameters, another set of coefficients can be obtained, which are called line spectrum pair parameters. They not only retain the characteristics of time domain calculation, but also have the connotation of reflecting the frequency characteristics of formants.
A homomorphic signal analysis method is used to analyze the speech signal to obtain a set of cepstrum parameters. Cepstrum parameters are considered to be a more suitable set of parameters for speech recognition.
Speech analysis technology is often used for speech coding compression, forming a variety of new solutions for medium and low speed coding. For example, subband coding, swap coding, adaptive predictive coding, multi-pulse excited linear predictive coding, code excited linear predictive coding, and the like. Speech recognition is also based on the results of speech analysis. Classification and recognition of parameters, using different parameters, can lead to different recognition results. The use of speech analysis technology can also design and manufacture various correction devices for pronunciation, which can be used for the treatment of pronunciation organ diseases or the pronunciation training of deaf people.
Commonly used instruments for speech analysis are speech mappers, which are used to analyze and record the dynamic spectrum of speech. Real-time digital sonograph is a new kind of sonograph. A more common method of speech analysis is to use a general-purpose computer to add speech processing settings and obtain it through special software calculations.
Figure 1 Schematic diagram of speech recognition technology

Speech analysis speech comprehension

Speech understanding uses artificial intelligence technologies such as knowledge expression and organization for automatic sentence recognition and semantic understanding. The main difference from speech recognition is the full use of grammar and semantic knowledge.
Speech understanding originated in the United States. In 1971, the American Perspective Research Projects Agency (ARPA) funded a huge research project. The goal of this project is called speech understanding system. Because people have extensive knowledge of speech and can have a certain foresight about what to say, people have the ability to perceive and analyze speech. Relying on people's extensive knowledge of language and content, and using knowledge to improve computers' ability to understand language is the core of speech understanding research.
The ability to understand can improve the performance of the system: can eliminate noise and noisy sound; can understand the meaning of the context and can use it to correct errors and clarify uncertain semantics; can deal with grammatical or incomplete sentences. Therefore, the purpose of studying speech comprehension can be said to be more effective than studying the system to recognize each word carefully.
In addition to the parts required for the original speech recognition, a speech understanding system must also add a knowledge processing part. Knowledge processing includes automatic collection of knowledge, formation of knowledge base, reasoning and testing of knowledge, etc. Of course, I also hope to have the ability to make knowledge corrections automatically. Therefore, speech understanding can be considered as the product of the combination of signal processing and knowledge processing. Phonetic knowledge includes phoneme knowledge, phonetic knowledge, prosody knowledge, lexical knowledge, syntax knowledge, semantic knowledge, and pragmatic knowledge. This knowledge involves many interdisciplinary subjects such as experimental phonetics, Chinese grammar, natural language understanding, and knowledge search.
The initial development of a speech understanding system is called the HEARSAY system. It uses a common "blackboard" as a knowledge base. Surrounding this blackboard is a series of expert systems that extract and search for various knowledge about phonemes, phonetic changes, etc ... The system that can further achieve the expected goal in the future is the HARPY system. This system uses a finite state model of language to gather various knowledge sources separated from each other through a single unified network. This unified network is called a knowledge compiler. Different understanding systems have different characteristics in terms of strategies or organization for utilizing knowledge.
A perfect speech understanding system is the research ideal that people dream of, but this is not a research topic that can be completely solved in the short term. However, the task-oriented speech understanding system, for example, involves only a limited vocabulary, a speech understanding system with generally speaking sentence patterns, and a speech understanding system that can be used by a range of staff. Therefore, it has practical value in certain areas of automation applications, such as airline ticket pre-sale systems, banking, hotel registration and inquiry systems.

Speech analysis speech recognition

Speech recognition A general term for a technology that uses a computer to automatically recognize phonemes, syllables, or words of a speech signal. Speech recognition is the basis for realizing automatic speech control.
Speech recognition originated from the "dictation typewriter" dream of the 1950s. After mastering the problem of the change of formants of vowels and the acoustic characteristics of consonants, scientists believe that the process from speech to text can be realized by machines, that is, they can put Ordinary pronunciation is converted into written text. The theoretical research of speech recognition has been for more than 40 years, but it has been transferred to practical applications after the development of digital technology and integrated circuit technology, and many practical results have now been achieved.
Speech recognition generally goes through the following steps: Speech preprocessing, including the normalization of the amplitude of the speech, frequency response correction, framing, windowing, and start and end point detection. Analysis of speech acoustic parameters, including analysis of speech formant frequency, amplitude and other parameters, as well as analysis of speech linear prediction parameters and cepstrum parameters. Parameter nominalization is mainly the nominalization on the time axis. Commonly used methods include dynamic time warping (DTW) or dynamic programming method (DP). Pattern matching can use distance criteria or probability rules, or syntactic classification. Recognition decision, the recognition result is given by the final discrimination function.
Speech recognition can be classified according to different recognition content: phoneme recognition, syllable recognition, word or phrase recognition; it can also be classified according to vocabulary: small vocabulary (less than 50 words), middle word (50 ~ 500 words) ), Large vocabulary (more than 500 words) and very large vocabulary (tens to tens of thousands of words). Classification according to pronunciation characteristics: It can be divided into the recognition of isolated sounds, connected sounds and continuous sounds. Classification according to the requirements of the speaker: recognized person identification, that is, only a specific speaker identification, and non-identified person identification, that is, no one can identify who the speaker is. Obviously, the most difficult speech recognition is speech recognition with large vocabulary, continuous sound and incognito.

Core functions of speech analysis

Core functions of speech analysis system:
1.Semantic analysis
Xunfei semantic analysis technology can automatically mine, analyze, classify and display the user's natural language, and provide support for operational analysis and decision-making.
2.Scene segmentation
Scene segmentation technology can automatically separate the user's voice from the agent's voice in a call recording, thereby facilitating different focus and more targeted inspection and analysis. It is an important supporting technology to achieve efficient voice analysis applications. The Xunfei scene segmentation technology has the highest accuracy in the industry, which is convenient for users to carry out statistical and analysis design for different roles.
3.Emotion detection
The Xunfei voice analysis system can automatically detect and judge the mood of the user or agent during a call. Once an abnormality is found, it can be recorded or warned in time. Xunfei's emotion detection technology combined with Xunfei's superior results in speech and language technology can provide higher accuracy and timeliness.
4.Speech rate detection
The system can automatically detect the speech rate of the separated agent or agent's voice. If the speed is too fast, it may be difficult for the user to hear clearly and affect the service quality. If the speed is too slow, the agent's skills may be insufficiently proficient or the work status may be poor .
5.Pick-up call detection
The system can automatically detect whether there are problems such as grabbing calls during the call, and make judgments and statistics.
6 , mute detection
The system can automatically detect the status of the long-term mute (the cold field, the user and the agent are not speaking) in the recording file, and the mute duration can be flexibly set and modified in the system.

Application value of speech analysis

InfoFly collects the needs of speech analysis applications in many key industries, and specially designs a speech analysis application system based on this, which can help users accelerate the process of application production and obtain benefits faster. In addition, Xunfei has an experienced project team that customizes and develops according to the individual needs of customers, perfects the functions and reports of application systems, and makes the system continuously adapt to the needs of customer business development.

IN OTHER LANGUAGES

Was this article helpful? Thanks for the feedback Thanks for the feedback

How can we help? How can we help?