Home > Back-end >  Calculating Entropy in a decision tree
Calculating Entropy in a decision tree

Time:09-23

I am finding it difficult to calculate entropy and information gain for ID3 in the scenario that there are multiple possible classes and the parent class has a lower entropy that the child. Let me use this as an example:

Attr Y AttrX  Result
4       a         0
3       b         0
4       c         0
4       d         0
3       e         0
4       a         1
3       b         1
3       c         1
4       d         1
4       e         1

Now here, I believe the first step is to calculate the Entropy of the original dataset, in this case, Result. Which would be the sum of negative logs of result being 1 and 0. i.e -(5/10)*log(5/10) - (5/10)*log(5/10) with log taken to the base 2. This is equal to an entropy of 1.

Now for attribute Y, the Info Gain can be calculated as 1 - entropy of each possible outcome * probability of outcome happening or 1 - 5/10[-(2/5)*log(2/5)-3/5*log(3/5)] - 5/10[-(2/5)*log(2/5) -(3/5)*log(3/5)] = 0.029

My question is, for attribute X, the info gain would be 1 - entropy of each outcome * probability of each outcome or 1 - (5/10)*(-1/5log(1/5)-1/5log(1/5)-1/5log(1/5)-1/5log(1/5)-1/5log(1/5)) - 5/10*(-1/5log(1/5)-1/5log(1/5)-1/5log(1/5)-1/5log(1/5)-1/5log(1/5)) = -1.32?

But doesnt Info Gain always have to be positive? Where am I going wrong here?

CodePudding user response:

The right formula for the gain is:

Entropy(S) - sum_i(|Si|/|S| * Entropy(Si))

with

  • Entropy(S) = -p1 * log(p1) -p0 * log(p0)
  • Si being the subset of S of class i
  • p0 (or p1) the proportion of elements in S with result = 0 (or 1).

So in the case of AttrX, each class appears 2/10 times, hence |Si|/|S| = 1/5.

Once we know the class (a, b, c, ...) the probability of having 1 or 0 in Result is 1/2.

So the entropy of each class is -1/2 * log(1/2) -1/2 * log(1/2) = 1 so the Gain is 1 - 1/5 * 1 * 5 = 0

In fact you could see this result intuitively: whatever the class is, the result is with 50% chances 1 or 0, so the information gain in knowing AttrX is 0.

  • Related