The pandas dataframe is as the following format:
df = pd.DataFrame([['A', 4], ['B', 12], ['B', 50], ['B', 19], ['A', 39], ['B', 12], ['A', 22], ['B', 33], ['B', 14], ['B', 43], ['A', 50], ['B', 34], ['A', 22], ['B', 60],
['A', 14], ['B', 31], ['B', 40], ['B', 38], ['A', 21], ['B', 41], ['A', 23], ['B', 45], ['B', 25], ['B', 32], ['A', 10], ['B', 31], ['A', 21], ['B', 51]])
df.columns = ['Type', 'Distance']
df.head(2):
Type Distance
0 A 4
1 B 12
Expectation:
1) create a histogram with bins=10,
2) y is the (amount of A in this bin)/(amount of A amount of B in this bin)
For example, in bin 10-20:
df1 = df[df['Distance'].between(10,20)]
df1.head(10)
There are 6 rows in this distance bin 10-20:
Type Distance
1 B 12
3 B 19
5 B 12
8 B 14
14 A 14
24 A 10
Hence,
amount of A in bin (10,20): 2
amount of B in bin (10,20): 4
amount of A in bin (10,20) / ((amount of A in bin (10,20) amount of B in bin (10,20)) = 2 / (2 4) = 33.3%
So for for bin 10-20, the y value should be 33.3%.
The above step applies to other bin to calculate the percentage of A out of (A B).
I also need to keep the X bin be 10, not float or else.
Any suggestion is appreciated.
CodePudding user response:
IIUC, you should pivot_table
to reshape your data, compute the bins in pandas using cut
and then plot:
df2 = df.pivot_table(index='Distance', columns='Type', aggfunc='size', fill_value=0)
bins = range(0, int(df2.index.max()) 1, 10)
df3 = df2.groupby(pd.cut(df2.index, bins=bins)).sum()
df3['A'].div(df3.sum(1)).plot.bar(width=1)
output:
>>> df3
Type A B
(3.944, 9.6] 1 0
(9.6, 15.2] 2 3
(15.2, 20.8] 0 1
(20.8, 26.4] 5 1
(26.4, 32.0] 0 3
(32.0, 37.6] 0 2
(37.6, 43.2] 1 4
(43.2, 48.8] 0 1
(48.8, 54.4] 1 2
(54.4, 60.0] 0 1