Uses regularized regression trees. These are like decision trees where each split is
- based on a single feature
- each leaf gives a real-valued prediction
Fitting a regression tree
- Train: set each weight at leaf by minimizing squared error
- We use using greedy recursive splitting for growing the tree
- Prediction: At each leaf, the prediction is the mean of all that fall under that leaf node
Ensemble and Boosting
We create a series of trees that are trained on the residual of the previous tree. That is, the first tree is trained on the actual dataset. The second tree is trained on the residuals of the prediction of the first tree, and so on.
tree[0] = fit(X,y)
y_hat = tree[0].predict(X)
tree[1] = fit(X,y - y_hat)
- `y_hat = y_hat + tree[1].predict(X)
tree[2] = fit(X,y - yhat)
- `y_hat = y_hat + tree[2].predict(X)
- …
Regularization
As long as not all , each tree decreases training error. However, it may overfit if trees are too deep or you have too many trees
We can apply L0-regularization (stop splitting if ) and L2-regularization to discourage this.