I have a dataframe in the following format:
time | parameter | TimeDelta |
---|---|---|
1 | 123 | - |
2 | 456 | 1 |
4 | 122 | 2 |
7 | 344 | 3 |
8 | 344 | 1 |
How to build an additional column with labeling, once TimeDelta is greater than e.g. 1.5? And also apply this labeling for the following rows once TimeDelta is again greater than 1.5?
time | parameter | TimeDelta | Label |
---|---|---|---|
1 | 123 | - | 1 |
2 | 456 | 1 | 1 |
4 | 122 | 2 | 2 |
7 | 344 | 3 | 3 |
8 | 344 | 1 | 3 |
I do not want to loop over every row, which is extremely slow. Maybe it is possible with cumsum() to flag all the following rows up to the next value above threshold?
CodePudding user response:
You can use part of soluton from previous answer, add 1
and assign to new column:
df['Label'] = pd.to_numeric(df['TimeDelta'], errors='coerce').gt(1.5).cumsum().add(1)
print (df)
time parameter TimeDelta Label
0 1 123 - 1
1 2 456 1 1
2 4 122 2 2
3 7 344 3 3
4 8 344 1 3