【多比克文学,经典语录,心灵鸡汤,励志语录,正能量句子,www.dopic.net】

数据挖掘:决策树 Decision Trees

专题: 散文 心理 简友广场 想法
作者:Cache_wood 来源:原文地址 时间:2022-04-19 16:34:02  阅读:225   网上投稿

@[toc]


Building Decision Trees

  • Use a top-down approach,staring from the root node with the set of all features
  • At each parent node,pick a feature to split the examples.
    • Feature selection criteria
      • Maximize variance reduction for continuous target
      • Maximize information gain (1-entropy) for categorical target
      • Maximize Gini impurity = for categorical target.
    • All examples are used for feature selection at each node

Limitations of decision Trees

  • Over-complex trees can overfit the data

    • Limit the number of levels of splitting,
    • Prune branches
  • Sensitive to data

    • Changing a few examples can cause picking different features that lead to a different tree
    • Random forest
  • Not easy to be parallelized in computing

Random Forest

  • Train multiple decision trees to improve robustness

    • Trees are trained independently in parallel

    • Majority voting for classification, average for regression

  • Where is the randomness from?

    • Bagging: randomly sample training examples with replacement
      • E.g. [1,2,3,4,5] → [1,2,2,3,4]
  • Randomly select a subset of features

Summary

  • Decision tree: an explainable model for classification/regression
  • Easy to train and tune, widely used in industry
  • Sensitive to data
    • Ensemble can help (more on bagging and boosting latter)

    相关美文阅读: