Home > Blockchain >  Drop all rows of a dataframe based on another dataframe
Drop all rows of a dataframe based on another dataframe

Time:11-15

I have two dataframes, df_1 and df_2. Both of them have datetimeindex, starting in 2022-01-01 and goes until 2022-08-14. The first one, df_1, has hourly measurements, and the second one, df_2, has daily measurements.

df_1 = pd.DataFrame(np.random.rand(5424, 1),
                    columns=["Random"],
                    index=pd.date_range(start="20220101000000", end="20220814230000", freq='H'))
df_2 = pd.DataFrame(np.random.randint(0, 3, size=226),
                    columns=["Random"],
                    index=pd.date_range(start="20220101", end="20220814", freq='D'))

How could i drop all rows from df_1 in wich the same day of df_2 has a measurement different from zero? For example, if the first two days of January have 6 and 7 as measurements, i would need to drop all 48 hours of those days in df_1.

CodePudding user response:

Use merge_asof with filter rows by another DataFrame:

np.random.seed(123)
df_1 = pd.DataFrame(np.random.rand(5424, 1),
                    columns=["Random"],
                    index=pd.date_range(start="20220101000000", 
                                        end="20220814230000", freq='H'))
df_2 = pd.DataFrame(np.random.randint(0, 3, size=226),
                    columns=["Random"],
                    index=pd.date_range(start="20220101", end="20220814", freq='D'))



df = pd.merge_asof(df_1, df_2, left_index=True, right_index=True)
df = df[df['Random_y'].ne(0)]
print (df)
                     Random_x  Random_y
2022-01-01 00:00:00  0.696469         1
2022-01-01 01:00:00  0.286139         1
2022-01-01 02:00:00  0.226851         1
2022-01-01 03:00:00  0.551315         1
2022-01-01 04:00:00  0.719469         1
                      ...       ...
2022-08-14 19:00:00  0.461419         2
2022-08-14 20:00:00  0.181182         2
2022-08-14 21:00:00  0.676527         2
2022-08-14 22:00:00  0.213839         2
2022-08-14 23:00:00  0.049919         2

[3696 rows x 2 columns]
  • Related