I have the following dataframe (sample):
import pandas as pd
data = [['A', False, 2], ['A', True, 8], ['A', False, 25], ['A', False, 30], ['B', False, 4], ['B', False, 8], ['B', True, 2], ['B', True, 1], ['B', False, 3]]
df = pd.DataFrame(data = data, columns = ['group', 'indicator', 'val'])
group indicator val
0 A False 2
1 A True 8
2 A False 25
3 A False 30
4 B False 4
5 B False 8
6 B True 2
7 B True 1
8 B False 3
I would like to add a column called idx
, which has values 0
for the rows with True
and a negative adding up values before (backward filling) the 0
and a positive adding up values after (forward filling) the 0
per group
. I started with using the following which does the job of adding a 0
for certain rows, but I don't know how to do the cumulative filling part:
# add zero index to rows with True
df.loc[df['indicator'] == True, 'idx'] = 0
# cumulative filling per group
df.groupby('group')['idx'].fillna(1).cumsum()
The desired output should look like this:
data = [['A', False, 2, -1], ['A', True, 8, 0], ['A', False, 25, 1], ['A', False, 30, 2], ['B', False, 4, -2], ['B', False, 8, -1], ['B', True, 2, 0], ['B', True, 1, 0], ['B', False, 3, 1]]
df_desired = pd.DataFrame(data = data, columns = ['group', 'indicator', 'val', 'idx'])
group indicator val idx
0 A False 2 -1
1 A True 8 0
2 A False 25 1
3 A False 30 2
4 B False 4 -2
5 B False 8 -1
6 B True 2 0
7 B True 1 0
8 B False 3 1
So I was wondering if anyone knows how to do the positive negative cumulative filling part per group using pandas
?
CodePudding user response:
I made an extra column named sign
and used .cumsum()
and a reversed version of .cumsum()
to create the column idx
:
# create column sign
df.loc[df['indicator'] == True, 'sign'] = -1
df['sign'] = df.groupby(['group'])['sign'].bfill()
df.loc[df['indicator'] == True, 'sign'] = 1
df['sign'] = df.groupby(['group'])['sign'].ffill()
df.loc[df['indicator'] == True, 'sign'] = 0
# create column idx
df.loc[df.sign == 1, 'idx'] = df[df.sign == 1].groupby(['group']).sign.cumsum()
df.loc[df.sign == 0, 'idx'] = 0
df.loc[df.sign == -1, 'idx'] = df[df.sign == -1][::-1].groupby(['group']).sign.cumsum()[::-1]
# remove column sign and make column idx integer value
df['idx'] = df['idx'].astype(int)
df = df.drop(columns='sign')
Output:
group indicator val idx
0 A False 2 -1
1 A True 8 0
2 A False 25 1
3 A False 30 2
4 B False 4 -2
5 B False 8 -1
6 B True 2 0
7 B True 1 0
8 B False 3 1