What Is a Convolutional Neural Network?

Convolutional Neural Networks (CNN) is a type of feedforward neural networks with deep structure that includes convolutional calculations. It is one of the representative algorithms of deep learning [1-2] . Convolutional neural networks have the capability of representation learning, which can shift-invariant classification the input information according to its hierarchical structure. Therefore, it is also called Shift-Invariant Artificial Neural Network Neural Networks, SIANN) " [3] .

Convolutional neural network computer vision

Image classification
See also: Image recognition
Bird recognition based on convolutional neural network [83]
Convolutional neural networks have long been one of the core algorithms in the field of image recognition, and have stable performance when the learning data is sufficient [84] . For general large-scale image classification problems, convolutional neural networks can be used to construct hierarchical classifiers [85] . They can also be used in fine-grained recognition to extract discriminative features of images for other purposes. Classifier for learning [86] . For the latter, feature extraction can artificially input different parts of the image into the convolutional neural network [83] , or it can be extracted by the convolutional neural network itself through unsupervised learning [87] .
CNN-RNN for character recognition and sequence labeling [88]
For text detection and text recognition / optical character reading, convolutional neural networks are used to determine whether the input image contains characters and to clip valid character fragments from them [89-90] . The convolutional neural network directly classified using multiple normalized exponential functions is used for house number recognition of Google Street View images [91] , and the convolutional neural network containing Conditional Random Fields (CRF) graph models can identify The words in the image [92] , a combination of a convolutional neural network and a recurrent neural network (RNN) can be used to extract character features from the image and perform sequence labelling [88] .
Object recognition
Convolutional neural networks can perform object recognition through three types of methods: sliding window, selective search, and YOLO (You Only Look Once) [2] . Sliding windows were the earliest and were used for problems such as gesture recognition [19] , but due to the large amount of calculation, they have been eliminated by the latter two [2] . Selectively search the corresponding region-based CNN. The algorithm first determines whether a window may have a target object through general steps, and further inputs it into a complex recognizer [93] . The YOLO algorithm defines object recognition as the regression problem of the occurrence probability of each target in the segmentation box in the image, and uses the same convolutional neural network for all segmentation boxes to output the probability, center coordinates and box size of each target [94] . Object recognition based on convolutional neural networks has been applied to autonomous driving [95] and real-time traffic monitoring systems [96] .
In addition, convolutional neural networks are also applied to issues such as image semantic segmentation [29] [97] , scene labeling [98-99], and image saliency detection [100] . Its performance has been proven to exceed that of many classification systems using feature engineering.
Action recognition
In the study of image-based behavioral cognition, image features extracted by convolutional neural networks are applied to action classification [101-102] . In the behavioral cognitive problem of video, convolutional neural networks can maintain their two-dimensional structure and learn by stacking features of continuous time segments [103] , build 3D convolutional neural networks that change along the time axis [104] , or step by step The features are extracted from the frames and input to the recurrent neural network [105] . All three can show good results in specific problems.
Pose estimation
Pose estimation outputs the human's pose in the form of coordinates in the image. The earliest convolutional neural network used in pose estimation is DeepPose. The structure of DeepPose is similar to AlexNet. It takes the complete picture as the output and trains it in a supervised learning manner. Output the coordinate point [106] . In addition, there are also researches on the application of convolutional neural networks for local pose estimation [107] . For video data, there have been studies on frame-by-frame pose estimation using convolutional neural networks with sliding windows [108] .
Neural style conversion: content (bottom left), style (top left), output (right) [109]
Neural style transfer
Neural style transfer is a special application of convolutional neural network. Its function is to create a third image based on two given images and make its content and style as close as possible to the given image [110 ] .
Neural style transfer is not essentially a machine learning problem, but an application of convolutional neural network representation learning capabilities. Specifically, neural style transfer extracts high-level representations in a pre-learned convolutional neural network, defines content loss and style loss through representations, and in the third image (usually initialized as white noise) Perform grid-by-grid optimization on linear combinations of content and style to output results [110] .
In addition to artistic creation, neural style transfer is also used for post-processing of photos [111] and super-resolution image generation [112] .

Convolutional neural network natural language processing

In general, due to the limitation of the size of the window or the convolution kernel, the long-distance dependence and structured grammatical features of natural language data cannot be well learned. The convolutional neural network in Natural Language Processing (NLP) The application is less than the recurrent neural network, and in many problems it will be designed on the framework of the recurrent neural network, but there are also some convolutional neural network algorithms that have been successful in multiple NLP topics [2] .
In the field of speech processing, the performance of convolutional neural networks has proven to be superior to Hidden Markov Model (HMM), Gaussian Mixture Model (GMM), and some other deep algorithms [113- 114] . Some studies use a hybrid model of convolutional neural network and HMM for speech processing. The model uses a small convolution kernel and replaces the pooling layer with a fully connected layer to improve its learning ability [115] . Convolutional neural networks can also be used for speech synthesis and language modeling. For example, WaveNet uses convolutional neural networks to generate conditional probabilities of output speech generated by models and samples synthesized speech [63] . The convolutional neural network combined with the Long Short Term Memory model (LSTM) can complement the input sentence well [116] . Other related work includes genCNN, ByteNet, etc. [31] [117] .

Convolutional neural network

physics
Using CNN to extract jet map features [118]
Convolutional neural networks have received attention in physics research involving big data problems. In high-energy physics, convolutional neural networks are used for the analysis and feature learning of jet images output by particle colliders. Related research includes quark / gluon classification [119] , W boson recognition [118] and research on neutrino interaction [120], etc. Convolutional neural networks have also been applied in astrophysics. Some studies have used convolutional neural networks to perform galaxy morphology analysis of telescope images [121] and extract galactic model parameters [122] . Using transfer learning technology, pre-trained convolutional neural networks can detect glitches in LIGO (Laser Interferometer Gravitational-wave Observatory) data, providing help for data preprocessing [123] .
Remote sensing science
Convolutional neural networks have been applied in remote sensing science, especially satellite remote sensing, and are considered to have advantages in computational efficiency and classification accuracy when analyzing the geometric, texture, and spatial distribution characteristics of remotely sensed images [124] . According to the source and purpose of remote sensing images, convolutional neural networks are used for land use / land cover change research [67] [125] and physical quantities such as sea-ice concentration ) Remote sensing inversion [126] . In addition, convolutional neural networks are used for object recognition of remote sensing images [127] and image semantic segmentation [128] . The latter two are direct computer vision problems, and will not be repeated here.
Atmospheric Science
Research on downscaling based on stacked SRCNN [129]
In atmospheric sciences, convolutional neural networks are used for statistical downscaling (SD) of numerical models and extreme weather detection of grid climate data. In terms of statistical downscaling, the multi-level downscaling system stacked by SRCNN can take the high-resolution raw meteorological data and the high-resolution digital elevation model (DEM) as inputs and output high-resolution The accuracy of the meteorological data exceeds that of the traditional Bias Corrected Spatial Disaggregation (BCSD) method [129-130] . In terms of extreme weather detection, the convolutional neural network mimicking the AlexNet structure has been proven in supervised and semi-supervised learning to identify tropical cyclones (tropical cyclones) in climate model output and reanalysis data with high accuracy. ), Atmospheric rivers and weather fronts [131-132] .
Programming module containing convolutional neural network
Modern mainstream machine learning libraries and interfaces, including TensorFlow, Keras, Thenao, Microsoft-CNTK, etc. can run convolutional neural network algorithms. In addition, some commercial numerical calculation software, such as MATLAB, also have a construction tool for convolutional neural networks [133] .

IN OTHER LANGUAGES

Was this article helpful? Thanks for the feedback Thanks for the feedback

How can we help? How can we help?