I need to calculate several % between groups, and I'm trying to optimally build an object that allows me to do so.
Say I have this frame:
df = pd.DataFrame({ "cluster" : ["A", "A", "B", "B", "A", "B", "C", "C", "C"], "category": ["x", "y", "x", "x", "x", "y", "y", "z", "x"], "result" : [0,1,1,0,0,1,1,1,0]})
To have an easy way to calculate several %, I need two sizes, one with the full object and other with a filter:
r1 = df.groupby(["cluster", "category"]).size()
print(r1)
r2 = df[df['result']==1].groupby(["cluster", "category"]).size()
print(r2)
However, r2 is not compatible with r1 with the indexes, and it will bring problems eventually when I want to plot the results in the same ax, so I'm trying to have for r2 same indexes as r1, and this is the best way I found:
r3 = (r2 r1 - r1).fillna(0)
print(r3)
Do you have a better way of doing this? Perhaps having all the info in a single object (with two named columns) would be awesome.
Thank you very much!
CodePudding user response:
If I understand you correctly, you can use pd.concat
(that way you will have single dataframe with two columns):
out = pd.concat([r1, r2], axis=1).fillna(0)
print(out)
Prints:
0 1
cluster category
A x 2 0.0
y 1 1.0
B x 2 1.0
y 1 1.0
C x 1 0.0
y 1 1.0
z 1 1.0