jzhao.xyz

Search

Search IconIcon to open search

Decision Tree

Last updated Sep 12, 2022 Edit Source

A simple program consisting of if-else decisions (decision stumps) based on the features.

1
2
3
4
5
6
7
8
9
Input: Vector y of length n with numbers {1,2,..k}
counts = zeros(k)
for i in 1:n
  counts[y[i]] += 1
entropy = 0
for c in 1:k
  prob = counts[c] / n
  entropy -= prob * log(prob)
return entropy

# Tradeoffs

Advantages

# Pseudocode for choosing decision stumps

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
// time complexity: O(ndk), O(nd) if k=1 (binary classifier)
decision_stump(feature matrix X, and label vector Y):
  compute error: number of times y_i does not equal most common value for feature j
  for each threshold t:
	  set y_yes to most common label of objects i satisfying rule (x_ij > t)
	  set y_no to most common label of objects i satisfying rule (x_ij <= t)
	  set y_pred[i] to be our preditions for each object i based on the rule
		  y_pred[i] = y_yes if satisfied, y_no otherwise
	  compute error e which is the number of objects where y_pred_i != y_i
	  store the rule (j, t, y_yes, y_no) if it has the lowest error so far
  return the best decision stump based on score