I have a dataframe which looks like below,
name,value,id
meanerror,0.55,aa
meanamount,120,aa
meanerror,0.45,bb
meanamount,150,bb
meanerror,0.88,cc
meanamount,110,cc
meanerror,0.1,dd
meanamount,50,dd
I would like to create a matrix from this dataframe like below.
, meanamount, total_y
meanerror,0-100,100-200
0.0-0.5, 1, 1, 2
0.5-1, 0, 2, 2
total_x, 1, 3
what I actually need is, in the matrix, each cell should contain count of ids which has value(from value column) in the range on both x and y axis of the matrix. i.e for example the first cell should contain count of ids with meanamount in range 0-100 and meanerror in range 0.0-5.
I have tried pandas pivot table and crosstab but unsure how to achieve this. Can anyone help?
CodePudding user response:
Create a pivot table:
pt = df.pivot(index='id', columns='name', values='value') # name meanamount meanerror # id # aa 120.0 0.55 # bb 150.0 0.45 # cc 110.0 0.88 # dd 50.0 0.10
Cut the amounts and errors into bins:
pt['meanamount'] = pd.cut(pt['meanamount'], bins=range(0, 300, 100)) pt['meanerror'] = pd.cut(pt['meanerror'], bins=np.arange(0, 1.5, 0.5)) # name meanamount meanerror # id # aa (100, 200] (0.5, 1.0] # bb (100, 200] (0.0, 0.5] # cc (100, 200] (0.5, 1.0] # dd (0, 100] (0.0, 0.5]
Create a crosstab of error x amount:
pd.crosstab(pt['meanerror'], pt['meanamount'], margins=True) # meanamount (0, 100] (100, 200] All # meanerror # (0.0, 0.5] 1 1 2 # (0.5, 1.0] 0 2 2 # All 1 3 4