How to make a percentage of the type column in a histogram?-CodePudding

The pandas dataframe is as the following format:

df = pd.DataFrame([['A', 4], ['B', 12], ['B', 50], ['B', 19], ['A', 39], ['B', 12], ['A', 22], ['B', 33], ['B', 14], ['B', 43], ['A', 50], ['B', 34], ['A', 22],  ['B', 60],
              ['A', 14], ['B', 31], ['B', 40], ['B', 38], ['A', 21], ['B', 41], ['A', 23], ['B', 45], ['B', 25], ['B', 32], ['A', 10], ['B', 31], ['A', 21],  ['B', 51]])
df.columns = ['Type', 'Distance']

df.head(2):

  Type  Distance

0    A         4

1    B        12

Expectation:

1) create a histogram with bins=10, 
2) y is the (amount of A in this bin)/(amount of A   amount of B in this bin)

For example, in bin 10-20:

df1 = df[df['Distance'].between(10,20)]
df1.head(10)

There are 6 rows in this distance bin 10-20:

Type    Distance

1   B   12

3   B   19

5   B   12

8   B   14

14  A   14

24  A   10

Hence,

amount of A in bin (10,20): 2
amount of B in bin (10,20): 4
amount of A in bin (10,20) / ((amount of A in bin (10,20) amount of B in bin (10,20)) = 2 / (2 4) = 33.3%
So for for bin 10-20, the y value should be 33.3%.

The above step applies to other bin to calculate the percentage of A out of (A B).

I also need to keep the X bin be 10, not float or else.

Any suggestion is appreciated.

CodePudding user response：

IIUC, you should pivot_table to reshape your data, compute the bins in pandas using cut and then plot:

df2 = df.pivot_table(index='Distance', columns='Type', aggfunc='size', fill_value=0)
bins = range(0, int(df2.index.max()) 1, 10)
df3 = df2.groupby(pd.cut(df2.index, bins=bins)).sum()
df3['A'].div(df3.sum(1)).plot.bar(width=1)

output:

>>> df3
Type          A  B
(3.944, 9.6]  1  0
(9.6, 15.2]   2  3
(15.2, 20.8]  0  1
(20.8, 26.4]  5  1
(26.4, 32.0]  0  3
(32.0, 37.6]  0  2
(37.6, 43.2]  1  4
(43.2, 48.8]  0  1
(48.8, 54.4]  1  2
(54.4, 60.0]  0  1