I have this pandas dataframe called df,
so,
df
time entry
0 2022-07-28 13:35:00 True
1 2022-07-29 14:15:00 True
Name: time, dtype: datetime64[ns]
The "entry" inside df IS ALWAYS True
sample code to generate it:
import pandas as pd
tbl = {"time" :["2022-07-28 13:35:00", "2022-07-29 14:15:00"],
"entry" : [True, True]}
df = pd.DataFrame(tbl)
df.sort_values(by = "time", inplace=True)
I have another dataframe that starts from the df time, but it has more dates, we will call this, df2:
df2
time entry target_long stop_long
0 2022-07-28 13:35:00 True NaN NaN
1 2022-07-28 13:35:15 True NaN NaN
2 2022-07-28 13:35:30 NaN NaN True
3 2022-07-28 13:35:45 True NaN NaN
. .
. .
n 2022-07-29 14:15:00 True NaN NaN
n 1 2022-07-29 14:15:15 True NaN NaN
n 2 2022-07-29 14:15:30 True NaN NaN
n 3 2022-07-29 14:15:45 NaN True NaN
n 4 2022-07-29 14:16:00 True NaN NaN
n 5 2022-07-29 14:16:15 NaN True NaN
sample code to generate it:
tbl2 = {"time" :["2022-07-28 13:35:00", "2022-07-28 13:35:15", "2022-07-28 13:35:30",
"2022-07-28 13:35:45", "2022-07-29 14:15:00","2022-07-29 14:15:15",
"2022-07-29 14:15:30", "2022-07-29 14:15:45", "2022-07-29 14:16:00", "2022-07-29 14:16:15"],
"entry" : [True, True, "NaN", True, True, True, True, "NaN", True, "NaN"],
"target_long" : ["NaN", "NaN", "NaN", "NaN", "NaN", "NaN", "NaN", True, "NaN", True],
"stop_long" : ["NaN", "NaN", True, "NaN", "NaN", "NaN", "NaN", "NaN", "NaN", "NaN"]}
df2 = pd.DataFrame(tbl2)
df2.sort_values(by = "date", inplace=True)
I need that when the "entry" in df2 is NaN AND if (("stop_long" is True) OR (target_long is True)), drop all other lines of df2, BUT if the time of df2 is inside df, so DON'T drop, but start to do the same thing of before.
The result will be a dataframe that will look like this:
df3
time entry target_long stop_long
0 2022-07-28 13:35:00 True NaN NaN
1 2022-07-28 13:35:30 NaN NaN True
2 2022-07-29 14:15:00 True NaN NaN
3 2022-07-29 14:15:45 NaN True NaN
Any ideas?
EDIT: I tried both the solutions of the answers, but there was a case not considered, i updated the sample code
CodePudding user response:
df.time = pd.to_datetime(df.time)
df2.time = pd.to_datetime(df2.time)
df = df.set_index('time')
df2 = df2.set_index('time')
df = df.replace('NaN', False).astype(bool)
df2 = df2.replace('NaN', False).astype(bool)
df3 = (df2.groupby(df2.index.date)
.apply(lambda x: x[~x.entry & (x.target_long | x.stop_long) | x.index.isin(df.index)]
[lambda y: y[(y.index <= y.target_long.idxmax()) | (y.index <= y.stop_long.idxmax())]])
.droplevel(-2)
.dropna(how='all')
.reset_index())
print(df3)
Output:
time entry target_long stop_long
0 2022-07-28 13:35:00 True NaN NaN
1 2022-07-28 13:35:30 NaN NaN True
2 2022-07-29 14:15:00 True NaN NaN
3 2022-07-29 14:15:45 NaN True NaN
CodePudding user response:
Please:
df3 = df2[df2["time"].isin(df["time"]) | ((df2['entry'] == "NaN") & ((df2['stop_long'] == True) | (df2['target_long'] == True)))]