What is Computational Linguistics?
Computational Linguistics refers to such a discipline. It establishes a formal mathematical model to analyze and process natural language, and uses a computer to implement the process of analysis and processing on a computer. The purpose of simulating part of or even all language skills.
- Computer language refers to the language used for communication between people and computers. Language is divided into two categories: natural language and artificial language. Natural language is the language formed by humans in the process of their own development, and is the medium for transmitting information between people.
- Artificial language is the language that people design for a purpose. Computer language is the term artificial language. Computer language is a medium for transmitting information between people and computers. The biggest feature of a computer system is that the instructions are transmitted to the machine through a language.
- In order for an electronic computer to perform various tasks, it is necessary to have a set of numbers, characters, and grammatical plans for writing computer programs. These characters and grammatical rules constitute various computer instructions (or various statements). These are the languages that computers can accept.
- A branch of linguistics that specifically refers to the use of electronic computers for language research.
- Computational Linguistics is sometimes called Measuring Linguistic,
- Shortly after the advent of the electronic computer, people considered its non-numerical operation and chose
- NLP started in the United States in the early 1950s. At that time, the United States was afraid of losing in the space race and needed to translate a large number of Russian scientific documents, so it was developed
- The development of computational linguistics to this day can be summarized into the following three aspects according to the nature and complexity of its work: Automatic orchestration: This is the computer's best job and the most mature part of computational linguistics. Statistics, classification, sorting of various language materials, editing of various vocabularies, indexes and dictionaries, establishment of corpora, terminology databases, etc. have been widely used. Because these technologies are already quite mature, there are ready-made software packages to provide services. Automatic analysis: This is a more complex language automatic processing. This automatic analysis system works based on specific language information stored in the computer in advance, and the purpose is to reach a predetermined conclusion, such as letting a computer look up a dictionary or perform a grammar test. If the conclusion is wrong, it proves that the dictionary or grammar is not complete, and the original data or rules need to be revised or supplemented. Such systems are generally still in experimental research. Automatic research: This is a more complex language automatic processing. This automatic research system works based on the general language information stored in the computer, and draws its own inferences by means of statistics, comparison, and analogy. Some in artificial intelligence research
- Computational linguistics can be said to be the product of a combination of computers and linguistics. This combination has yielded fruitful results, in addition to those applied topics mentioned above, but also in the impact on linguistic theories and methods. The definition of language has expanded: language is not only an important communication tool for humans, but also a communication tool between humans and computers. In order to meet the requirements of computer processing, the biggest feature of computational linguistics is the formalization of language, because only formalization can be algorithmic and automated. Based on this requirement, a series of automatic analysis methods for language information processing were developed, including
- The core problem of computational linguistics and natural language information processing research is Language Understanding and Language Generation. The former identifies the syntactic structure of the sentence from the string of word symbols at the surface level of the sentence, determines the semantic relationship between the components, and finally understands the meaning of the sentence; the latter selects the words from the meaning to be expressed, and constructs each component according to the semantic relationship between the words. Between the semantic structure and the syntactic structure, it finally produces grammatical and logical sentences.
- The study of computational linguistics, like other disciplines, has two levels of scientific research and technical research. The purpose of scientific research is to discover the inherent laws of language, to explore the computational methods of language understanding and generation, and to build the basic resources of language information processing; while technical research is driven by the application of goals to design and develop practical languages according to the actual needs of society Information processing system.
- The application goal of natural language information processing is to enable people and computers to communicate in natural language. Specifically, it is to establish various computer application software systems that process natural language, such as: machine translation, natural language understanding, automatic speech recognition and synthesis, automatic text recognition, computer-assisted teaching, information retrieval, automatic text classification, automatic abstraction, and There are information extraction from text, intelligent search on the Internet, and various electronic dictionaries and terminology databases.
- With the widespread popularity of the Internet, the social demand for language information processing is growing, and people urgently need to use automated means to process massive amounts of language information. However, due to the limitations of the development of subject theory and the complexity of Chinese itself, the research on computational linguistic theories and methods in China has not yet provided sufficient support for the development of Chinese information processing application systems. One of the characteristics of the development of computational linguistics and natural language processing in China over the years is that the goals of applied research and practical system development are relatively clear, relatively large amounts of investment, and some results have been achieved; while the research of basic theory and methods is relatively weak. The research situation and development trends from 1998 to 2002 remain the same. Among the various application goals mentioned above, the projects with relatively concentrated research power are: text information retrieval, automatic document classification, automatic abstraction, automatic speech recognition and synthesis, machine translation, and text information extraction and filtering. In addition, the construction of linguistic resources and corpus-based linguistic analysis methods have received special attention and have made rapid progress. The following first briefly describes the representative basic research in the field of computational linguistics and language information processing, then introduces the application-oriented research and the development of practical systems, then talks about the construction of language resources, and finally introduces relevant academic conferences and periodicals, and works. These works all take the written language as the research object, and the research on the spoken language will be introduced as a separate article on speech recognition and synthesis [2] .