I'm new to programming; and stepping into it with python; and heard it was best to learn as you do. So I've been playing around with a dataset and done most of my code; but learned that pandas has so many handy features that I haven't been taking advantage of;
I am hoping to group data between when a cumulative value hits 0; so data including the 0; and the data before it will be grouped; and the next bunch of data until cumulative value hits 0 will be grouped as well; its easier to see
this dataset;
d = {'col1': [1, 2, 3, 4, 5, 6,7,8], 'col2': [10, 1, -1, -10, 5, 1,-6,5] }
d = pd.DataFrame(d)
gives something like this; and I would like when col2 cum value reach 0; to group/rename col 1 entries by the first value that took it away from 0.
col1 col2
0 1 10
1 2 1
2 3 -1
3 4 -10
4 5 5
5 6 1
6 7 -6
7 8 5
So ideally something like this
col1 col2 cumvalue
0 1 10 10
1 1 1 11
2 1 -1 10
3 1 -10 0
4 2 5 5
5 2 1 6
6 2 -6 0
7 3 5 5
I have tried df.groupby() but I just cant get the right syntax to get it! Thankyou!
CodePudding user response:
Try:
d["cumvalue"] = d["col2"].cumsum()
d["col1"] = d["cumvalue"].eq(0).cumsum().shift().fillna(0).astype(int) 1
print(d)
Prints:
col1 col2 cumvalue
0 1 10 10
1 1 1 11
2 1 -1 10
3 1 -10 0
4 2 5 5
5 2 1 6
6 2 -6 0
7 3 5 5
CodePudding user response:
You can do :
df['cumvalue'] = df['col2'].cumsum()
df['col1'] = df.groupby(df['cumvalue']==0).cumcount() 1
df['col1'] = df['col1'].mask(df['cumvalue'] !=0).bfill().ffill().astype(int)
print(df):
col1 col2 cumvalue
0 1 10 10
1 1 1 11
2 1 -1 10
3 1 -10 0
4 2 5 5
5 2 1 6
6 2 -6 0
7 2 5 5
CodePudding user response:
This could be achieved by looping through each row in the dataframe:
import pandas as pd
d = {'col1': [1, 2, 3, 4, 5, 6,7,8], 'col2': [10, 1, -1, -10, 5, 1,-6,5] }
d = pd.DataFrame(d)
group_index = 1
cumvalue = 0
cumvalue_list = []
groupindex_list = []
for index, row in d.iterrows():
groupindex_list.append(group_index)
cumvalue = cumvalue row['col2']
cumvalue_list.append(cumvalue)
if cumvalue == 0:
group_index = group_index 1
d['col1'] = groupindex_list
d['cumvalue'] = cumvalue_list
col1 col2 cumvalue
0 1 10 10
1 1 1 11
2 1 -1 10
3 1 -10 0
4 2 5 5
5 2 1 6
6 2 -6 0
7 3 5 5