Home > database >  How to drop specific pandas dataframe rows based on complex conditions
How to drop specific pandas dataframe rows based on complex conditions

Time:08-01

I have this pandas dataframe called df,

so,

df

    time                       entry

0   2022-07-28 13:35:00         True
1   2022-07-29 14:15:00         True
Name: time, dtype: datetime64[ns] 

The "entry" inside df IS ALWAYS True

sample code to generate it:

import pandas as pd

tbl = {"time" :["2022-07-28 13:35:00", "2022-07-29 14:15:00"],
      "entry" : [True, True]}

df = pd.DataFrame(tbl)


df.sort_values(by = "time", inplace=True)

I have another dataframe that starts from the df time, but it has more dates, we will call this, df2:

df2 

    time                      entry      target_long      stop_long

0   2022-07-28 13:35:00       True          NaN             NaN
1   2022-07-28 13:35:15       True          NaN             NaN
2   2022-07-28 13:35:30       NaN           NaN             True
3   2022-07-28 13:35:45       True          NaN             NaN
.          . 
.          .
n    2022-07-29 14:15:00      True          NaN             NaN
n 1  2022-07-29 14:15:15      True          NaN             NaN
n 2  2022-07-29 14:15:30      True          NaN             NaN
n 3  2022-07-29 14:15:45      NaN           True            NaN
n 4  2022-07-29 14:16:00      True          NaN             NaN
n 5  2022-07-29 14:16:15      NaN           True            NaN

sample code to generate it:

tbl2 = {"time" :["2022-07-28 13:35:00", "2022-07-28 13:35:15", "2022-07-28 13:35:30",
                "2022-07-28 13:35:45", "2022-07-29 14:15:00","2022-07-29 14:15:15",
                "2022-07-29 14:15:30", "2022-07-29 14:15:45", "2022-07-29 14:16:00", "2022-07-29 14:16:15"],
        "entry" : [True, True, "NaN", True, True, True, True, "NaN", True, "NaN"],
       "target_long" : ["NaN", "NaN", "NaN", "NaN", "NaN", "NaN", "NaN", True, "NaN", True],
        "stop_long" : ["NaN", "NaN", True, "NaN", "NaN", "NaN", "NaN", "NaN", "NaN", "NaN"]}

df2 = pd.DataFrame(tbl2)
df2.sort_values(by = "date", inplace=True)

I need that when the "entry" in df2 is NaN AND if (("stop_long" is True) OR (target_long is True)), drop all other lines of df2, BUT if the time of df2 is inside df, so DON'T drop, but start to do the same thing of before.

The result will be a dataframe that will look like this:

df3

    time                      entry      target_long      stop_long

0   2022-07-28 13:35:00       True          NaN             NaN
1   2022-07-28 13:35:30       NaN           NaN             True

2   2022-07-29 14:15:00       True          NaN              NaN
3   2022-07-29 14:15:45       NaN           True             NaN

Any ideas?

EDIT: I tried both the solutions of the answers, but there was a case not considered, i updated the sample code

CodePudding user response:

df.time = pd.to_datetime(df.time)
df2.time = pd.to_datetime(df2.time)

df = df.set_index('time')
df2 = df2.set_index('time')

df = df.replace('NaN', False).astype(bool)
df2 = df2.replace('NaN', False).astype(bool)

df3 = (df2.groupby(df2.index.date)
          .apply(lambda x: x[~x.entry & (x.target_long | x.stop_long) | x.index.isin(df.index)]
                [lambda y: y[(y.index <= y.target_long.idxmax()) | (y.index <= y.stop_long.idxmax())]])
          .droplevel(-2)
          .dropna(how='all')
          .reset_index())
print(df3)

Output:

                 time entry target_long stop_long
0 2022-07-28 13:35:00  True         NaN       NaN
1 2022-07-28 13:35:30   NaN         NaN      True
2 2022-07-29 14:15:00  True         NaN       NaN
3 2022-07-29 14:15:45   NaN        True       NaN

CodePudding user response:

Please:

df3 = df2[df2["time"].isin(df["time"]) | ((df2['entry'] == "NaN") & ((df2['stop_long'] == True) | (df2['target_long'] == True)))]
  • Related