Home > Net >  Create count matrix based on two columns and respective range values pandas
Create count matrix based on two columns and respective range values pandas

Time:03-02

I have a dataframe which looks like below,

name,value,id
meanerror,0.55,aa
meanamount,120,aa
meanerror,0.45,bb
meanamount,150,bb
meanerror,0.88,cc
meanamount,110,cc
meanerror,0.1,dd
meanamount,50,dd

I would like to create a matrix from this dataframe like below.

,         meanamount,    total_y
meanerror,0-100,100-200
0.0-0.5,    1,    1,      2   
0.5-1,      0,    2,      2
total_x,    1,    3

what I actually need is, in the matrix, each cell should contain count of ids which has value(from value column) in the range on both x and y axis of the matrix. i.e for example the first cell should contain count of ids with meanamount in range 0-100 and meanerror in range 0.0-5.

I have tried pandas pivot table and crosstab but unsure how to achieve this. Can anyone help?

CodePudding user response:

  1. Create a pivot table:

    pt = df.pivot(index='id', columns='name', values='value')
    
    # name  meanamount  meanerror
    # id                         
    # aa         120.0       0.55
    # bb         150.0       0.45
    # cc         110.0       0.88
    # dd          50.0       0.10
    
  2. Cut the amounts and errors into bins:

    pt['meanamount'] = pd.cut(pt['meanamount'], bins=range(0, 300, 100))
    pt['meanerror'] = pd.cut(pt['meanerror'], bins=np.arange(0, 1.5, 0.5))
    
    # name  meanamount   meanerror
    # id                          
    # aa    (100, 200]  (0.5, 1.0]
    # bb    (100, 200]  (0.0, 0.5]
    # cc    (100, 200]  (0.5, 1.0]
    # dd      (0, 100]  (0.0, 0.5]
    
  3. Create a crosstab of error x amount:

    pd.crosstab(pt['meanerror'], pt['meanamount'], margins=True)
    
    # meanamount  (0, 100]  (100, 200]  All
    # meanerror                            
    # (0.0, 0.5]         1           1    2
    # (0.5, 1.0]         0           2    2
    # All                1           3    4
    
  • Related