Suppose we have the following dataframe and would like to compute the probabilities of frequencies between B and C.
data = pd.DataFrame({'id_' : [1000, 1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010],
'A' : [1608, 1608, 2089, 213, 1005, 1887, 2089, 4544, 6866, 2020, 2020],
'B' : [1772, 1772, 1608, 1608, 1790, 1790, 1791, 1791, 1772, 1799, 1799],
'C': [1772,1608, 1005,1791, 4544, 2020, 1791, 1772, 1799, 2020, 213],
})
I have run the crosstab to compute the frequency of B and C:
df = pd.crosstab(data['B'], data['C'])
print(df)
C 213 1005 1608 1772 1791 1799 2020 4544
B
1608 0 1 0 0 1 0 0 0
1772 0 0 1 1 0 1 0 0
1790 0 0 0 0 0 0 1 1
1791 0 0 0 1 1 0 0 0
1799 1 0 0 0 0 0 1 0
Now I would like to calculate the probability of each row element-wise so that the output could look as follows:
213 1005 1608 1772 1791 1799 2020 4544
1608 0 0.5 0 0 0.5 0 0 0
1772 0 0 0.33 0.33 0 0.33 0 0
1790 0 0 0 0 0 0 0.5 0.5
1791 0 0 0 0.5 0.5 0 0 0
1799 0.5 0 0 0 0 0 0.5 0
I have tried the following:
prob = [i/sum(i) for i in range(df)]
and I got this error:
TypeError: 'DataFrame' object cannot be interpreted as an integer
I read about the error here why-does-dataframe-object-cannot-be-interpreted-as-an-integer I tried following the advice but it didn't work. I also read another solution here Compute percentage for each row in pandas which applies
df.iloc[:, 1:].apply(lambda x: x / x.sum())
but the probabilities I got are not accurate.
If there is another way to get the probabilities without crosstab, that would also be helpful.
CodePudding user response:
You need to do this instead:
pd.crosstab(data.B,data.C, normalize='index').round(4)*100
which gives:
C 213 1005 1608 1772 1791 1799 2020 4544
B
1608 0.0 50.0 0.00 0.00 50.0 0.00 0.0 0.0
1772 0.0 0.0 33.33 33.33 0.0 33.33 0.0 0.0
1790 0.0 0.0 0.00 0.00 0.0 0.00 50.0 50.0
1791 0.0 0.0 0.00 50.00 50.0 0.00 0.0 0.0
1799 50.0 0.0 0.00 0.00 0.0 0.00 50.0 0.0
or
print(pd.crosstab(data.B,data.C, normalize='index').round(2))
which is:
C 213 1005 1608 1772 1791 1799 2020 4544
B
1608 0.0 0.5 0.00 0.00 0.5 0.00 0.0 0.0
1772 0.0 0.0 0.33 0.33 0.0 0.33 0.0 0.0
1790 0.0 0.0 0.00 0.00 0.0 0.00 0.5 0.5
1791 0.0 0.0 0.00 0.50 0.5 0.00 0.0 0.0
1799 0.5 0.0 0.00 0.00 0.0 0.00 0.5 0.0