I have a two values that are being found in a for loop like so:
for i in range(df_zones.shape[0]):
filter_max = df_labels[df_labels['Labels'] == i].sort_values(by='level').iloc[-1]
filter_min = df_labels[df_labels['Labels'] == i].sort_values(by='level').iloc[0]
I have another dataframe with 4 columns of measurements with a timeseries index, like so:
DateTime | meas1 | meas2 | meas3 | meas4 |
---|---|---|---|---|
2022-1-1 | 1.1 | 1.2 | 1.3 | 1.1 |
There are 1000's of rows of data.
What I am trying to do is have another column that is labeled as 'isZone', where this means, are any of the values in the row between filter_max and filter_min.
DateTime | meas1 | meas2 | meas3 | meas4 | isZone |
---|---|---|---|---|---|
2022-1-1 | 1.1 | 1.5 | 1.5 | 1.7 | 0 |
2022-1-2 | 2.2 | 1.4 | 1.5 | 1.7 | 0 |
2022-1-3 | 3.1 | 1.2 | 1.3 | 1.1 | 1 |
2022-1-4 | 4.1 | 1.2 | 1.3 | 1.1 | 1 |
2022-1-5 | 5.1 | 1.2 | 1.3 | 1.1 | 1 |
I have read about the pandas between function. But I really can't figure out how to make this work. Is there a quicker way to do this in numpy? any guidance would be appreciated.
CodePudding user response:
You can solve this with apply
and pandas' between:
df_zones['Flag'] = df_zones.apply(lambda x: 1 if x.between(filter_min,filter_max).any() else 0,axis=1)
How about trying with .T
and using a list-comprehension this way?
df_zones['Flag'] = [1 if df_zones.T[x].between(min_,max_).any() else 0 for x in df_zones.T]
Or without the transposing:
df_zones['Flag'] = [1 if df_zones.loc[x,:].between(min_,max_).any() else 0 for x in df_zones.index]