I've got a df displaying a time series. I'm hoping to create two new columns. Where Color
is equal to Red
, I look at the next 3 time intervals in Val
and assign the lowest value to a new column (Low_Val
).
This works fine but I'm aiming to also include that respective timestamp as Low_Time
import numpy as np
import pandas as pd
d = ({
'Date' : ['2022-07-25 01:46:00 01:00','2022-07-25 01:47:00 01:00','2022-07-25 01:48:00 01:00','2022-07-25 01:49:00 01:00','2022-07-25 01:50:00 01:00','2022-07-25 01:51:00 01:00','2022-07-25 01:52:00 01:00','2022-07-25 01:53:00 01:00'],
'Val' : [5,1,4,-2,8,4,3,1],
'Color' : ['Red', np.NaN, np.NaN, np.NaN, 'Red', np.NaN, 'Red', np.NaN],
})
df = pd.DataFrame(data = d)
mask = (df['Color'] == 'Red')
df['Low_Val'] = np.where(mask, (df['Val'][::-1].rolling(3, min_periods = 1).min()[::-1]), np.nan)
df['Low_Time'] = df[df['Low'] == df['Val']]['Date']
intended output:
Date Val Color Low_Val Low_Time
0 2022-07-25 01:46:00 01:00 5 Red 1.0 2022-07-25 01:47:00 01:00
1 2022-07-25 01:47:00 01:00 1 NaN NaN NaN
2 2022-07-25 01:48:00 01:00 4 NaN NaN NaN
3 2022-07-25 01:49:00 01:00 -2 NaN NaN NaN
4 2022-07-25 01:50:00 01:00 8 Red 3.0 2022-07-25 01:52:00 01:00
5 2022-07-25 01:51:00 01:00 4 NaN NaN NaN
6 2022-07-25 01:52:00 01:00 3 Red 1.0 2022-07-25 01:53:00 01:00
7 2022-07-25 01:53:00 01:00 1 NaN NaN NaN
CodePudding user response:
Convert Date to datetime
df["Date"] = pd.to_datetime(df["Date"].str.split(' ').str[0])
find index where value is red
index_list = df[df.Color == "Red"].index
Derive low val and low time
for index in index_list:
start = index
end = index 3
sliced_df = df[start:end]
low_val = min(sliced_df.loc[start:end, "Val"])
# May need to refine the logic if there are duplicates in the sub dataframe
low_time = sliced_df.loc[sliced_df.Val == low_val, "Date"].tolist()[0]
df.loc[index, "Low_Val"] = low_val
df.loc[index, "Low_Time"] = low_time
result
Date Val Color Low_Val Low_Time
0 2022-07-25 01:46:00 5 Red 1.0 2022-07-25 01:47:00
1 2022-07-25 01:47:00 1 NaN NaN NaT
2 2022-07-25 01:48:00 4 NaN NaN NaT
3 2022-07-25 01:49:00 -2 NaN NaN NaT
4 2022-07-25 01:50:00 8 Red 3.0 2022-07-25 01:52:00
5 2022-07-25 01:51:00 4 NaN NaN NaT
6 2022-07-25 01:52:00 3 Red 1.0 2022-07-25 01:53:00
7 2022-07-25 01:53:00 1 NaN NaN NaT
CodePudding user response:
IIUC, this should work for you:
d = ({
'Date' : ['2022-07-25 01:46:00 01:00','2022-07-25 01:47:00 01:00','2022-07-25 01:48:00 01:00','2022-07-25 01:49:00 01:00','2022-07-25 01:50:00 01:00','2022-07-25 01:51:00 01:00','2022-07-25 01:52:00 01:00','2022-07-25 01:53:00 01:00'],
'Val' : [5,1,4,-2,8,4,3,1],
'Color' : ['Red', np.NaN, np.NaN, np.NaN, 'Red', np.NaN, 'Red', np.NaN],
})
df = pd.DataFrame(data = d)
mask = (df['Color'] == 'Red')
df['Low_Val'] = np.where(mask, (df['Val'][::-1].rolling(3, min_periods = 1).min()[::-1]), np.nan)
df.Date = pd.to_datetime(df.Date)
min_dates = []
for _, row in df.assign(D1 = lambda df: df.Date.shift(-1)).assign(D2 = lambda df: df.Date.shift(-2)).assign(D3 = lambda df: df.Date.shift(-3)).iterrows():
if row.Color == 'Red':
min_dates.append(min([row.D1, row.D2, row.D3]))
else:
min_dates.append(np.nan)
df['min_dates'] = min_dates
print(df)
Output:
Date Val Color Low_Val min_dates
0 2022-07-25 01:46:00 01:00 5 Red 1.0 2022-07-25 01:47:00 01:00
1 2022-07-25 01:47:00 01:00 1 NaN NaN NaT
2 2022-07-25 01:48:00 01:00 4 NaN NaN NaT
3 2022-07-25 01:49:00 01:00 -2 NaN NaN NaT
4 2022-07-25 01:50:00 01:00 8 Red 3.0 2022-07-25 01:51:00 01:00
5 2022-07-25 01:51:00 01:00 4 NaN NaN NaT
6 2022-07-25 01:52:00 01:00 3 Red 1.0 2022-07-25 01:53:00 01:00
7 2022-07-25 01:53:00 01:00 1 NaN NaN NaT