Home > OS >  Create column with positive and negative adding up index based on certain row in Pandas
Create column with positive and negative adding up index based on certain row in Pandas

Time:09-26

I have the following dataframe (sample):

import pandas as pd

data = [['A', False, 2], ['A', True, 8], ['A', False, 25], ['A', False, 30], ['B', False, 4], ['B', False, 8], ['B', True, 2], ['B', True, 1], ['B', False, 3]]
df = pd.DataFrame(data = data, columns = ['group', 'indicator', 'val'])

  group  indicator  val
0     A      False    2
1     A       True    8
2     A      False   25
3     A      False   30
4     B      False    4
5     B      False    8
6     B       True    2
7     B       True    1
8     B      False    3

I would like to add a column called idx, which has values 0 for the rows with True and a negative adding up values before (backward filling) the 0 and a positive adding up values after (forward filling) the 0 per group. I started with using the following which does the job of adding a 0 for certain rows, but I don't know how to do the cumulative filling part:

# add zero index to rows with True
df.loc[df['indicator'] == True, 'idx'] = 0
# cumulative filling per group
df.groupby('group')['idx'].fillna(1).cumsum()

The desired output should look like this:

data = [['A', False, 2, -1], ['A', True, 8, 0], ['A', False, 25, 1], ['A', False, 30, 2], ['B', False, 4, -2], ['B', False, 8, -1], ['B', True, 2, 0], ['B', True, 1, 0], ['B', False, 3, 1]]
df_desired = pd.DataFrame(data = data, columns = ['group', 'indicator', 'val', 'idx'])

  group  indicator  val  idx
0     A      False    2   -1
1     A       True    8    0
2     A      False   25    1
3     A      False   30    2
4     B      False    4   -2
5     B      False    8   -1
6     B       True    2    0
7     B       True    1    0
8     B      False    3    1

So I was wondering if anyone knows how to do the positive negative cumulative filling part per group using pandas?

CodePudding user response:

I made an extra column named sign and used .cumsum() and a reversed version of .cumsum() to create the column idx:

# create column sign
df.loc[df['indicator'] == True, 'sign'] = -1
df['sign'] = df.groupby(['group'])['sign'].bfill()

df.loc[df['indicator'] == True, 'sign'] = 1
df['sign'] = df.groupby(['group'])['sign'].ffill()

df.loc[df['indicator'] == True, 'sign'] = 0

# create column idx
df.loc[df.sign == 1, 'idx'] = df[df.sign == 1].groupby(['group']).sign.cumsum()
df.loc[df.sign == 0, 'idx'] = 0
df.loc[df.sign == -1, 'idx'] = df[df.sign == -1][::-1].groupby(['group']).sign.cumsum()[::-1]
# remove column sign and make column idx integer value
df['idx'] = df['idx'].astype(int)
df = df.drop(columns='sign')

Output:

  group  indicator  val  idx
0     A      False    2   -1
1     A       True    8    0
2     A      False   25    1
3     A      False   30    2
4     B      False    4   -2
5     B      False    8   -1
6     B       True    2    0
7     B       True    1    0
8     B      False    3    1
  • Related