Home > Software engineering >  I want to select duplicate rows between 2 dataframes
I want to select duplicate rows between 2 dataframes

Time:05-11

I want to filter rolls (df1) with date column that in datetime64[ns] from df2 (same column name and dtype). I tried searching for a solution but I get the error:

Can only compare identically-labeled Series objects | 'Timestamp' object is not iterable or other.

sample df1

id date value
1 2018-10-09 120
2 2018-10-09 60
3 2018-10-10 59
4 2018-11-25 120
5 2018-08-25 120

sample df2

date
2018-10-09
2018-10-10

sample result that I want

id date value
1 2018-10-09 120
2 2018-10-09 60
3 2018-10-10 59

In fact, I want this program to run 1 time in every 7 days, counting back from the day it started. So I want it to remove dates that are not in these past 7 days.

# create new dataframe -> df2
data = {'date':[]}
df2 = pd.DataFrame(data)

#Set the date to the last 7 days.
days_use = 7 # 7 -> 1
for x in range (days_use,0,-1):
    days_use = x
    use_day = date.today() - timedelta(days=days_use)
    df2.loc[x] = use_day

#Change to datetime64[ns]
df2['date'] = pd.to_datetime(df2['date'])  

CodePudding user response:

Use isin:

>>> df1[df1["date"].isin(df2["date"])]
   id        date  value
0   1  2018-10-09    120
1   2  2018-10-09     60
2   3  2018-10-10     59

If you want to create df2 with the dates for the past week, you can simply use pd.date_range:

df2 = pd.DataFrame({"date": pd.date_range(pd.Timestamp.today().date()-pd.DateOffset(7),periods=7)})

>>> df2
        date
0 2022-05-03
1 2022-05-04
2 2022-05-05
3 2022-05-06
4 2022-05-07
5 2022-05-08
6 2022-05-09
  • Related