What Is a Convolutional Neural Network?
Convolutional Neural Networks (CNN) is a type of feedforward neural networks with deep structure that includes convolutional calculations. It is one of the representative algorithms of deep learning [1-2] . Convolutional neural networks have the capability of representation learning, which can shift-invariant classification the input information according to its hierarchical structure. Therefore, it is also called Shift-Invariant Artificial Neural Network Neural Networks, SIANN) " [3] .
Convolutional neural network computer vision
- Image classification
- See also: Image recognition
- Bird recognition based on convolutional neural network [83]
- CNN-RNN for character recognition and sequence labeling [88]
- Object recognition
- Convolutional neural networks can perform object recognition through three types of methods: sliding window, selective search, and YOLO (You Only Look Once) [2] . Sliding windows were the earliest and were used for problems such as gesture recognition [19] , but due to the large amount of calculation, they have been eliminated by the latter two [2] . Selectively search the corresponding region-based CNN. The algorithm first determines whether a window may have a target object through general steps, and further inputs it into a complex recognizer [93] . The YOLO algorithm defines object recognition as the regression problem of the occurrence probability of each target in the segmentation box in the image, and uses the same convolutional neural network for all segmentation boxes to output the probability, center coordinates and box size of each target [94] . Object recognition based on convolutional neural networks has been applied to autonomous driving [95] and real-time traffic monitoring systems [96] .
- In addition, convolutional neural networks are also applied to issues such as image semantic segmentation [29] [97] , scene labeling [98-99], and image saliency detection [100] . Its performance has been proven to exceed that of many classification systems using feature engineering.
- Action recognition
- In the study of image-based behavioral cognition, image features extracted by convolutional neural networks are applied to action classification [101-102] . In the behavioral cognitive problem of video, convolutional neural networks can maintain their two-dimensional structure and learn by stacking features of continuous time segments [103] , build 3D convolutional neural networks that change along the time axis [104] , or step by step The features are extracted from the frames and input to the recurrent neural network [105] . All three can show good results in specific problems.
- Pose estimation
- Pose estimation outputs the human's pose in the form of coordinates in the image. The earliest convolutional neural network used in pose estimation is DeepPose. The structure of DeepPose is similar to AlexNet. It takes the complete picture as the output and trains it in a supervised learning manner. Output the coordinate point [106] . In addition, there are also researches on the application of convolutional neural networks for local pose estimation [107] . For video data, there have been studies on frame-by-frame pose estimation using convolutional neural networks with sliding windows [108] .
- Neural style conversion: content (bottom left), style (top left), output (right) [109]
- Neural style transfer is a special application of convolutional neural network. Its function is to create a third image based on two given images and make its content and style as close as possible to the given image [110 ] .
- Neural style transfer is not essentially a machine learning problem, but an application of convolutional neural network representation learning capabilities. Specifically, neural style transfer extracts high-level representations in a pre-learned convolutional neural network, defines content loss and style loss through representations, and in the third image (usually initialized as white noise) Perform grid-by-grid optimization on linear combinations of content and style to output results [110] .
- In addition to artistic creation, neural style transfer is also used for post-processing of photos [111] and super-resolution image generation [112] .
Convolutional neural network natural language processing
- In general, due to the limitation of the size of the window or the convolution kernel, the long-distance dependence and structured grammatical features of natural language data cannot be well learned. The convolutional neural network in Natural Language Processing (NLP) The application is less than the recurrent neural network, and in many problems it will be designed on the framework of the recurrent neural network, but there are also some convolutional neural network algorithms that have been successful in multiple NLP topics [2] .
- In the field of speech processing, the performance of convolutional neural networks has proven to be superior to Hidden Markov Model (HMM), Gaussian Mixture Model (GMM), and some other deep algorithms [113- 114] . Some studies use a hybrid model of convolutional neural network and HMM for speech processing. The model uses a small convolution kernel and replaces the pooling layer with a fully connected layer to improve its learning ability [115] . Convolutional neural networks can also be used for speech synthesis and language modeling. For example, WaveNet uses convolutional neural networks to generate conditional probabilities of output speech generated by models and samples synthesized speech [63] . The convolutional neural network combined with the Long Short Term Memory model (LSTM) can complement the input sentence well [116] . Other related work includes genCNN, ByteNet, etc. [31] [117] .
Convolutional neural network
- physics
- Using CNN to extract jet map features [118]
- Remote sensing science
- Convolutional neural networks have been applied in remote sensing science, especially satellite remote sensing, and are considered to have advantages in computational efficiency and classification accuracy when analyzing the geometric, texture, and spatial distribution characteristics of remotely sensed images [124] . According to the source and purpose of remote sensing images, convolutional neural networks are used for land use / land cover change research [67] [125] and physical quantities such as sea-ice concentration ) Remote sensing inversion [126] . In addition, convolutional neural networks are used for object recognition of remote sensing images [127] and image semantic segmentation [128] . The latter two are direct computer vision problems, and will not be repeated here.
- Atmospheric Science
- Research on downscaling based on stacked SRCNN [129]
- Programming module containing convolutional neural network
- Modern mainstream machine learning libraries and interfaces, including TensorFlow, Keras, Thenao, Microsoft-CNTK, etc. can run convolutional neural network algorithms. In addition, some commercial numerical calculation software, such as MATLAB, also have a construction tool for convolutional neural networks [133] .