What Is the Connection Between Speech Synthesis and Recognition?
Speech recognition technology, also known as Automatic Speech Recognition (ASR), aims to convert vocabulary content in human speech into computer-readable input, such as keystrokes, binary codes, or character sequences. Unlike speaker recognition and speaker confirmation, the latter attempts to identify or confirm the speaker who spoke, rather than the vocabulary contained in it.
- With the advancement of data processing technology and the rapid popularization of the mobile Internet, computer technology has been widely used in all areas of society, and with it the generation of massive data. Among them, voice data has received more and more attention. Speech recognition is an interdisciplinary subject. For nearly two decades. Speech recognition technology has made significant progress and has begun to move from the laboratory to the market. It is expected that in the next 10 years, speech recognition technology will enter various fields such as industry, home appliances, communications, automotive electronics, medical, home services, consumer electronics, and so on. The application of speech recognition dictation machine in some fields was rated as one of the ten major events of computer development in 1997 by the American press. Many experts believe that speech recognition technology is one of the ten most important science and technology development technologies in the field of information technology from 2000 to 2010. The fields involved in speech recognition technology include: signal processing, pattern recognition,
- Speech recognition involves psychology, physiology, acoustics, linguistics, information theory, signal processing, computer science,
- There are four commonly used methods for speech recognition technology: 1. methods based on linguistics and acoustics, 2. random model methods, 3. methods using artificial neural networks, and 4. probabilistic parsing. The most popular method is the stochastic model method. [3]
- Speech recognition systems can be classified based on restrictions on input speech.
- Considering the relevance of speakers to recognition systems
- Recognition systems can be divided into three categories: (1) specific person speech recognition systems: considering only the speech of a specific person; (2) non-specific person speech systems: the recognized speech has nothing to do with people, usually a large number of different people's The speech database learns the recognition system; (3) Multi-person recognition system: usually can recognize the speech of a group of people, or become a specific group of speech recognition systems, the system only requires the training of the group of people to be recognized.
- Considering the way you speak
- The recognition system can also be divided into 3 categories: (1) isolated word speech recognition system: isolated word recognition system requires that each word be paused; (2) connected word speech recognition system: connected word input system requires each word They are all clearly pronounced, and some liaison phenomena have begun to appear; (3) Continuous speech recognition system: continuous speech input is natural and fluent continuous speech input, and a large number of liaisons and changes in voice will appear.
- Considering the vocabulary of the recognition system
- Recognition systems can also be divided into three categories: (1) small vocabulary speech recognition systems. Speech recognition systems that usually include dozens of words. (2) Speech recognition system with medium vocabulary. Recognition systems that typically include hundreds to thousands of words. (3) Large vocabulary speech recognition system. Speech recognition systems that typically include thousands to tens of thousands of words. With the computer and
- Bill Gates once said: "Voice technology will make computers lose the mouse and keyboard." With the miniaturization of computers, keyboard and mouse have become a major obstacle to the development of computers. Human computers have evolved from ultra-large volumes to microcomputers that now occupy less than 1 square meter. Presumably, future computers may be unexpectedly small, so keyboards and mice are an obstacle to them. At this time, speech recognition is required to complete them. command. Some scientists have also said, "The next generation of computer revolution is from graphical interfaces to voice user interfaces." This shows that the development of speech recognition technology has undoubtedly changed people's lives. In some areas, the telephone is gradually evolving into a server rather than a simple conversation tool. Through the telephone, people can also use voice to obtain the information they want, and their work efficiency has naturally increased by a grade. [3]
- Speech recognition technology has gradually become a key step of human-machine interface. Such a highly competitive emerging industry, its market development is very rapid, and its development trend is gradually rising. From 1999 to 2005, the voice recognition technology market is growing at a 31% annual trend. Now in iPhones and other smart phones, the voice assistant has become a standard feature, bringing a lot of convenience to users. People can also use the phone And the Internet to order air tickets, train tickets, and even travel services. Therefore, speech recognition technology also has more and more broad development prospects and application fields in our actual life. [3]
- In the telephone and communication system, the intelligent voice interface is changing the telephone from a pure service tool into a service provider and life partner; using the telephone and communication network, people can easily use voice commands to The remote database system queries and extracts relevant information. With the miniaturization of computers, the keyboard has become a big obstacle for mobile platforms. Imagine if the mobile phone is only the size of a watch, and then use the keyboard to dial. impossible. Speech recognition is gradually becoming the key technology of man-machine interface in information technology. The combination of speech recognition technology and speech synthesis technology enables people to shake off the keyboard and operate by voice commands. The application of voice technology has become a competitive emerging high-tech industry. [4]
- Speech recognition technology has developed to this day, especially for small and medium vocabulary non-specific person speech recognition system recognition accuracy has been greater than 98%, specific person speech recognition system recognition accuracy is even higher. These technologies have been able to meet the requirements of common applications. Due to the development of large-scale integrated circuit technology, these complex speech recognition systems can also be made into special-purpose chips and mass-produced. In the western developed countries, a large number of speech recognition products have entered the market and service fields. Some users' phones, telephones, and mobile phones already include voice recognition dialing functions, as well as products such as voice notepads and voice smart toys that also include voice recognition and voice synthesis functions. People can inquire about air tickets, travel, and bank information with a speech recognition spoken dialogue system through the telephone network, and get very good results. Survey statistics show that as many as 85% of people are satisfied with the performance of the speech recognition information query service system. [4]
- Google launches speech recognition technology