Uses regularized regression trees. These are like decision trees where each split is

  • based on a single feature
  • each leaf gives a real-valued prediction

Fitting a regression tree

  • Train: set each weight at leaf by minimizing squared error
    • We use using greedy recursive splitting for growing the tree
  • Prediction: At each leaf, the prediction is the mean of all that fall under that leaf node

Ensemble and Boosting

We create a series of trees that are trained on the residual of the previous tree. That is, the first tree is trained on the actual dataset. The second tree is trained on the residuals of the prediction of the first tree, and so on.

  • tree[0] = fit(X,y)
  • y_hat = tree[0].predict(X)
  • tree[1] = fit(X,y - y_hat)
  • `y_hat = y_hat + tree[1].predict(X)
  • tree[2] = fit(X,y - yhat)
  • `y_hat = y_hat + tree[2].predict(X)


As long as not all , each tree decreases training error. However, it may overfit if trees are too deep or you have too many trees

We can apply L0-regularization (stop splitting if ) and L2-regularization to discourage this.