Home > Software engineering >  Match indexes of two groupby
Match indexes of two groupby

Time:03-07

I need to calculate several % between groups, and I'm trying to optimally build an object that allows me to do so.

Say I have this frame:

df = pd.DataFrame({ "cluster" : ["A", "A", "B", "B", "A", "B", "C", "C", "C"], "category": ["x", "y", "x", "x", "x", "y", "y", "z", "x"], "result" : [0,1,1,0,0,1,1,1,0]})

To have an easy way to calculate several %, I need two sizes, one with the full object and other with a filter:

r1 = df.groupby(["cluster", "category"]).size()
print(r1)

r2 = df[df['result']==1].groupby(["cluster", "category"]).size()
print(r2)

However, r2 is not compatible with r1 with the indexes, and it will bring problems eventually when I want to plot the results in the same ax, so I'm trying to have for r2 same indexes as r1, and this is the best way I found:

r3 = (r2   r1 - r1).fillna(0)
print(r3)

Do you have a better way of doing this? Perhaps having all the info in a single object (with two named columns) would be awesome.

Thank you very much!

CodePudding user response:

If I understand you correctly, you can use pd.concat (that way you will have single dataframe with two columns):

out = pd.concat([r1, r2], axis=1).fillna(0)
print(out)

Prints:

                  0    1
cluster category        
A       x         2  0.0
        y         1  1.0
B       x         2  1.0
        y         1  1.0
C       x         1  0.0
        y         1  1.0
        z         1  1.0
  • Related