What Is Decision Tree Learning?

Decision tree training in statistics, data mining and machine learning, using decision trees as predictive models to predict the class of a sample. This kind of decision tree is also called classification tree or regression tree. In the structure of these trees, the leaf nodes are given class labels and the internal nodes represent certain attributes.

in
In data mining, there are two main types of decision trees:
  • The output of the classification tree is the class label of the sample.
  • The output of the regression tree is a real number (such as the price of the house, the time the patient stays in the hospital, etc.)
The term classification and regression tree (CART) encompasses both of these decision trees and was first proposed by Breiman et al.
Usually used when building decision trees
Compared with other data mining algorithms, decision trees have many advantages:
  • Easy to understand and explain It's easy to understand the meaning of decision trees.
  • Needs little data to prepare Other technologies often require data normalization.
  • Both numerical and categorical data can be processed. Other technologies often can only handle one data type. For example, association rules can only deal with categorical data while neural networks can only deal with numerical data.
  • Use a white box model. The output is easily explained by the structure of the model. The neural network is a black box model and it is difficult to interpret the output.
  • The performance of the model can be verified with a test set. Consider the stability of the model.
  • Robust control. Good robustness to noise processing.
  • Can handle large-scale data well.
  • Training an optimal decision tree is a complete NP problem. Therefore, in practical applications, decision tree training uses heuristic search algorithms such as greedy algorithm to achieve local optimum. Such an algorithm cannot get the optimal decision tree.
  • The excessive complexity of decision tree creation can lead to poor prediction of data outside the training set. This is called overfitting. The pruning mechanism can avoid this problem.
  • Some problem decision trees cannot be solved well, such as the XOR problem. When solving such problems, the decision tree becomes too large. To solve this problem, you can only change the problem area or use other more time-consuming learning algorithms (such as statistical relationship learning or inductive logic programming).
  • For those data with category attributes, the information gain will be biased [3]

    Decision tree learning decision graph

    In the decision tree, the path from the root node to the leaf node is converged or ANDed. In the decision graph, the minimum message length (MML) can be used to converge two or more paths.

    Decision tree learning uses evolutionary algorithms to search

    Evolutionary algorithms can be used to avoid local optimization problems [4] .

    IN OTHER LANGUAGES

Was this article helpful? Thanks for the feedback Thanks for the feedback

How can we help? How can we help?