Giving the format:
df=pd.DataFrame({'Time_diff':[1400,1200,1000,1800,1200,1200,1200,1800,1200]})
There is a column "time_diff", I am trying to add group number in a new column "gourp_num", and the group number will increase one when it meets a condition : Time_diff > 1800. As shown below:
Time_diff | group_num |
---|---|
1400 | 1 |
1200 | 1 |
1000 | 1 |
1800 | 2 |
1200 | 2 |
1200 | 2 |
1200 | 2 |
1800 | 3 |
1200 | 3 |
I wrote a loop but it doesn't work, what should I do?
a=1
for i in range(1,len(df)):
if df[i]['time_diff'] < 1800:
df[i]['group_num']=a
else:
a =1
df[i]['group_num']=a
CodePudding user response:
Check for condition time_diff
>= 1800 by .ge()
and use .cumsum()
to increment the count whenever the condition fulfills again down the series:
df['group_num'] = df['Time_diff'].ge(1800).cumsum() 1
Result:
print(df)
Time_diff group_num
0 1400 1
1 1200 1
2 1000 1
3 1800 2
4 1200 2
5 1200 2
6 1200 2
7 1800 3
8 1200 3
CodePudding user response:
I have modified your code to get the expected solution using for loop
import pandas as pd
df=pd.DataFrame({'Time_diff':[1400,1200,1000,1800,1200,1200,1200,1800,1200]})
a=1
group_num = []
for i, row in df.iterrows():
if row['Time_diff'] < 1800:
group_num.append(a)
else:
a =1
group_num.append(a)
df['group_num']=group_num
print(df.to_string(index=False))
output
Time_diff group_num
1400 1
1200 1
1000 1
1800 2
1200 2
1200 2
1200 2
1800 3
1200 3
>