Home > Software engineering >  Pandas how to group data between a cumulative value reaching 0
Pandas how to group data between a cumulative value reaching 0

Time:12-08

I'm new to programming; and stepping into it with python; and heard it was best to learn as you do. So I've been playing around with a dataset and done most of my code; but learned that pandas has so many handy features that I haven't been taking advantage of;

I am hoping to group data between when a cumulative value hits 0; so data including the 0; and the data before it will be grouped; and the next bunch of data until cumulative value hits 0 will be grouped as well; its easier to see

this dataset;


d = {'col1': [1, 2, 3, 4, 5, 6,7,8], 'col2': [10, 1, -1, -10, 5, 1,-6,5] }
d = pd.DataFrame(d)

gives something like this; and I would like when col2 cum value reach 0; to group/rename col 1 entries by the first value that took it away from 0.

   col1  col2
0     1    10
1     2     1
2     3    -1
3     4   -10
4     5     5
5     6     1
6     7    -6
7     8     5

So ideally something like this


   col1  col2  cumvalue
0     1    10        10
1     1     1        11
2     1    -1        10
3     1   -10         0
4     2     5         5
5     2     1         6
6     2    -6         0
7     3     5         5

I have tried df.groupby() but I just cant get the right syntax to get it! Thankyou!

CodePudding user response:

Try:

d["cumvalue"] = d["col2"].cumsum()
d["col1"] = d["cumvalue"].eq(0).cumsum().shift().fillna(0).astype(int)   1

print(d)

Prints:

   col1  col2  cumvalue
0     1    10        10
1     1     1        11
2     1    -1        10
3     1   -10         0
4     2     5         5
5     2     1         6
6     2    -6         0
7     3     5         5

CodePudding user response:

You can do :

df['cumvalue'] = df['col2'].cumsum()
df['col1'] = df.groupby(df['cumvalue']==0).cumcount()   1
df['col1'] = df['col1'].mask(df['cumvalue'] !=0).bfill().ffill().astype(int)

print(df):

   col1  col2  cumvalue
0     1    10        10
1     1     1        11
2     1    -1        10
3     1   -10         0
4     2     5         5
5     2     1         6
6     2    -6         0
7     2     5         5

CodePudding user response:

This could be achieved by looping through each row in the dataframe:

import pandas as pd

d = {'col1': [1, 2, 3, 4, 5, 6,7,8], 'col2': [10, 1, -1, -10, 5, 1,-6,5] }
d = pd.DataFrame(d)

group_index = 1
cumvalue = 0

cumvalue_list = []
groupindex_list = []
for index, row in d.iterrows():
    groupindex_list.append(group_index)
    cumvalue = cumvalue   row['col2']
    cumvalue_list.append(cumvalue)
    
    if cumvalue == 0:
        group_index = group_index   1
    
d['col1'] = groupindex_list
d['cumvalue'] = cumvalue_list

   col1  col2  cumvalue
0     1    10        10
1     1     1        11
2     1    -1        10
3     1   -10         0
4     2     5         5
5     2     1         6
6     2    -6         0
7     3     5         5
  • Related