I have a dataframe
that I am trying to use pandas.groupby
on to get the cumulative sum. The values that I am grouping by show up in two different columns, and I am having trouble getting the groupby to work correctly. My starting dataframe
is:
df = pd.DataFrame({'col_A': ['red', 'red', 'blue', 'red'], 'col_B': ['blue', 'red', 'blue', 'red'], 'col_A_qty': [1, 1, 1, 1], 'col_B_qty': [1, 1, 1, 1]})
col_A col_B col_A_qty col_B_qty
red blue 1 1
red red 1 1
blue blue 1 1
red red 1 1
The result I am trying to get is:
col_A col_B col_A_qty col_B_qty red_cumsum blue_cumsum
red blue 1 1 1 1
red red 1 1 3 1
blue blue 1 1 3 3
red red 1 1 5 3
I've tried:
df.groupby(['col_A', 'col_B'])['col_A_qty'].cumsum()
but this groups on the combination of col_A
and col_B
. How can I use pandas.groupby
to calculate the cumulative sum of red and blue, regardless of if it's in col_A
or col_B
?
CodePudding user response:
Try two pivot
out = pd.pivot(df,columns='col_A',values='col_A_qty').fillna(0).cumsum().add(pd.pivot(df,columns='col_B',values='col_B_qty').fillna(0).cumsum(),fill_value=0)
Out[404]:
col_A blue red
0 1.0 1.0
1 1.0 3.0
2 3.0 3.0
3 3.0 5.0
df = df.join(out)
CodePudding user response:
A simple method is to define each cumsum
column by two Series.cumsum
, as follows:
df['red_cumsum'] = df['col_A'].eq('red').cumsum() df['col_B'].eq('red').cumsum()
df['blue_cumsum'] = df['col_A'].eq('blue').cumsum() df['col_B'].eq('blue').cumsum()
In each column col_A
and col_B
, check for values equal 'red'
/ 'blue'
(results are boolean series). Then, we use Series.cumsum
on these resultant boolean series to get the cumulative counts. You don't really need to use pandas.groupby
in this use case.
If you have multiple items in col_A
and col_B
, you can also iterate through the unique item list, as follows:
for item in pd.unique(df['col_A'].tolist() df['col_B'].tolist()):
df[f'{item}_cumsum'] = df['col_A'].eq(item).cumsum() df['col_B'].eq(item).cumsum()
Result:
print(df)
col_A col_B col_A_qty col_B_qty red_cumsum blue_cumsum
0 red blue 1 1 1 1
1 red red 1 1 3 1
2 blue blue 1 1 3 3
3 red red 1 1 5 3