I have two dataframes, df_1
and df_2
. Both of them have datetimeindex
, starting in 2022-01-01
and goes until 2022-08-14
. The first one, df_1
, has hourly measurements, and the second one, df_2
, has daily measurements.
df_1 = pd.DataFrame(np.random.rand(5424, 1),
columns=["Random"],
index=pd.date_range(start="20220101000000", end="20220814230000", freq='H'))
df_2 = pd.DataFrame(np.random.randint(0, 3, size=226),
columns=["Random"],
index=pd.date_range(start="20220101", end="20220814", freq='D'))
How could i drop all rows from df_1
in wich the same day of df_2
has a measurement different from zero? For example, if the first two days of January have 6 and 7 as measurements, i would need to drop all 48 hours of those days in df_1
.
CodePudding user response:
Use merge_asof
with filter rows by another DataFrame:
np.random.seed(123)
df_1 = pd.DataFrame(np.random.rand(5424, 1),
columns=["Random"],
index=pd.date_range(start="20220101000000",
end="20220814230000", freq='H'))
df_2 = pd.DataFrame(np.random.randint(0, 3, size=226),
columns=["Random"],
index=pd.date_range(start="20220101", end="20220814", freq='D'))
df = pd.merge_asof(df_1, df_2, left_index=True, right_index=True)
df = df[df['Random_y'].ne(0)]
print (df)
Random_x Random_y
2022-01-01 00:00:00 0.696469 1
2022-01-01 01:00:00 0.286139 1
2022-01-01 02:00:00 0.226851 1
2022-01-01 03:00:00 0.551315 1
2022-01-01 04:00:00 0.719469 1
... ...
2022-08-14 19:00:00 0.461419 2
2022-08-14 20:00:00 0.181182 2
2022-08-14 21:00:00 0.676527 2
2022-08-14 22:00:00 0.213839 2
2022-08-14 23:00:00 0.049919 2
[3696 rows x 2 columns]