What Are the Different Types of Jobs in Sound Engineering?

Sound retrieval service (sound retrieval service) refers to a service that retrieves music programs and other audio information on demand (started by the user).

Sound retrieval service (sound retrieval service) refers to a service that retrieves music programs and other audio information on demand (started by the user).
Multimedia retrieval technology is a comprehensive technology that uses digital processing to process the dissemination carriers of text, sound, image (shape) and other information. At this stage, multimedia retrieval services can be divided into three categories based on the content of the retrieval: image retrieval, video retrieval, and sound retrieval.
Content-based sound retrieval includes finding general sounds by serial numbers and retrieving sounds of a given sample value in a matching manner. The common methods of content-based sound retrieval are: feature description method, which includes natural language description method and sound interpretation method; content retrieval method, which includes assignment search, example matching search, browse search, language recognition and synthesis. Retrieve.
Many previous researches have involved the processing of speech signals, such as speech recognition. The machine can easily recognize isolated words automatically, such as used in dedicated dictation and telephone applications, while continuous speech recognition is more difficult and error-prone, but breakthrough progress has been made in this area. To identify the speaker. These research results will provide great help for audio information retrieval.
As an information carrier, audio can be divided into three types.
Wave sound is a digital audio signal obtained by digitizing analog sound. It can represent speech, music, nature, and synthesized sounds.
Speech, with morphemes such as words and grammar, is a highly abstract conceptual communication medium. Speech can be converted into text after recognition. Text is a kind of speech
Different types will have different inherent content. But overall, audio content is divided into three levels: the lowest level of physical sample level, the middle level of acoustic feature level, and the highest level of semantic level. From low-level to high-level, its content is abstracted level by level, and its representation is summarized level by level. At the physical sample level, audio content is presented in the form of streaming media, and users can retrieve or recall audio sample data by time scale. Such as the common audio recording and playback program interface. The middle layer is the acoustic characteristic level. The acoustic features are automatically extracted from the audio data. Some auditory features express the user's perception of audio and can be directly used for retrieval; some features are used for speech recognition or detection and support higher-level content representation. The highest level is the semantic level, which is a conceptual level description of audio content and audio objects. Specifically, at this level, the content of audio is the result of speech recognition, detection, and discrimination, a description of music melody and narrative, and a description of audio objects and concepts. The last two layers are content-based audio retrieval technologies. At these two levels, users can submit conceptual queries or query by auditory perception. The auditory characteristics of audio determine that its query method is different from conventional information retrieval systems. A content-based query is a similar query that actually retrieves all sounds that are very similar to the requirements specified by the user. The query can specify the number of sounds returned or the degree of similarity. In addition, some feature components can be emphasized or turned off (ignored), and even logical "not" (or fuzzy Less matching relationship) can be applied to specify search conditions, and those that do not have or have some feature components (such as specifying no "Sharp" or rarely "sharp"). In addition, a given set of sounds can be sorted according to acoustic characteristics, such as sorting by the noise level of the sounds.
On the query interface, users can use the following forms to improve queries:
Example method. The user selects a sound example to express his query requirements, and finds all sounds similar to the sound in some characteristics. Such as query all sounds similar to the roar of the aircraft.
The metaphorical approach. Describe the query requirements by choosing some acoustic / perceptual physical characteristics, such as brightness, pitch, and volume. This approach is similar to drawing queries in visual queries.
Onomatopoeia. Make a sound similar to the sound you are looking for to express a query. For example, the user can make a buzzing sound to look for bees or electrical noise.
Subjective feature method. Use personal description language to describe the sound. This requires training systems to understand the meaning of these descriptive terms, such as the user may be looking for a "happy" sound.
Browse method. This is an important means of information discovery, especially for time-based media such as audio. According to the division of audio media, it can be known that speech, music and other sounds have significantly different characteristics, so the current processing methods can be divided into three corresponding types: processing audio with and without audio, which in turn Music is divided separately. In other words, the first is to use automatic speech recognition technology, and the latter two are to use more general audio analysis to suit a wider range of audio media, such as music and sound effects, and of course also include digital voice signals.
Generally speaking, audio information retrieval is divided into retrieval based on speech technology, audio retrieval and music retrieval.
Speech retrieval shows a search centered on speech, using processing technologies such as speech recognition. Speech-based retrieval includes retrieval using large vocabulary speech recognition technology. This method uses dynamic speech recognition (ASR) technology to convert speech into text, so that text retrieval can be used for retrieval. Although a good continuous speech recognition system can achieve more than 90% word accuracy under careful operation, in practical applications, such as telephone and news broadcast, the recognition rate is not high.
Retrieval based on the word segmentation unit: When the speech recognition system processes a wide range of speech data with unlimited topics in various aspects, the recognition performance will deteriorate, especially when some professional words (such as person names and places) are not in the system lexicon. A workaround is to use acne index units. When a query is executed, the user's query is first decomposed into sub-word units, and then the features of these units are matched with pre-calculated features in the library.
Retrieval based on recognition keywords, segmentation based on speaker recognition, etc. are all techniques for audio information retrieval.
The audio search shows the search with waveform sound as the object. Here the audio can be car engine, rain sound, bird sound. It can also be voice and music, etc. These audios are all retrieved with acoustic features in a unified manner. Audio retrieval includes sound training and classification, auditory retrieval, and audio segmentation.
Music retrieval is based on musical characteristics such as musical notes and melody. [1]

IN OTHER LANGUAGES

Was this article helpful? Thanks for the feedback Thanks for the feedback

How can we help? How can we help?