I have a dataframe (df1) with a date column with the dd/mm/yyyy date format.
I have second dataframe (df2) with the same structure, however, has some shared data. I want to add the data from df2 to df1 for the data is after the most recent date in df1.
My approach was to find the maxdate in df1 and then look for dates in df2 subset and append to df1
maxdate = df.loc[pd.to_datetime(df['DATE'],dayfirst=True).idxmax(), 'DATE']
# in this instance it is 11/09/2022 (dd/mm/yyyy)
df3 = df2.loc[pd.to_datetime(df2['DATE']) > maxdate] #this is to by my subset to append to df1
some of the df1 below
DATE TIME X Y Z
3692 23/08/2022 16:55:00 734154.2872 9551189.353 2.845237e 03
3693 23/08/2022 16:55:00 734199.2516 9551070.666 2.842993e 03
3694 23/08/2022 05:02:00 734669.6130 9551361.865 2.845012e 03
3695 24/08/2022 17:25:00 734215.9910 9551068.295 2.842111e 03
3696 24/08/2022 17:25:00 734684.8444 9551383.618 2.846049e 03
3697 27/08/2022 17:20:00 734214.1851 9551061.242 2.841501e 03
3698 28/08/2022 17:00:00 734669.6130 9551361.865 2.845012e 03
3699 30/08/2022 05:25:00 734176.3412 9551168.550 2.844325e 03
3700 01/09/2022 17:18:00 734686.1061 9551385.420 2.846083e 03
3701 01/09/2022 17:18:00 734667.0922 9551358.264 2.844812e 03
3702 01/09/2022 17:18:00 734164.7047 9551178.039 2.844962e 03
3703 02/09/2022 17:16:00 734151.9079 9551185.951 2.845472e 03
3704 03/09/2022 17:15:00 734141.2542 9551197.062 2.844747e 03
3705 04/09/2022 17:08:00 734687.3678 9551387.222 2.846116e 03
3706 04/09/2022 17:08:00 734665.8319 9551356.464 2.844713e 03
3707 05/09/2022 05:08:00 734704.3326 9551376.581 2.842331e 03
3708 07/09/2022 16:58:00 734687.3678 9551387.222 2.846116e 03
3709 08/09/2022 16:55:00 734663.3109 9551352.864 2.844512e 03
3710 10/09/2022 17:03:00 734689.8913 9551390.826 2.846184e 03
3711 11/09/2022 17:13:00 734691.1530 9551392.628 9.551393e 06
some of df2 below
DATE TIME X Y Z
134 23/08/22 16:55:00 734154.2872 9551189.3534 2845.237
135 23/08/22 16:55:00 734199.2516 9551070.6664 2842.9929
136 23/08/22 5:02:00 734669.613 9551361.8645 2845.0122
138 24/08/22 17:25:00 734215.991 9551068.2954 2842.1106
139 24/08/22 17:25:00 734684.8444 9551383.618 2846.0492
147 27/08/22 17:20:00 734214.1851 9551061.2423 2841.501
149 28/08/22 17:00:00 734669.613 9551361.8645 2845.0122
151 29/08/22 17:30:00 - - -
153 30/08/22 5:25:00 734176.3412 9551168.5498 2844.325
180 11/09/22 17:13:00 734691.153 9551392.6276 9551392.6276
However df3 is sub setting the dataframe that includes dates before the "maxdate"
I feel it is related to the date format that I have.
any help appreciated.
CodePudding user response:
You need to convert the values to pandas DateTime, else the comparison will be based on string values and not the dates, also its not clear if 11 is the day or 09 is the day in sample max date 11/09/2022
, if 11
is the day, you also need to pass dayfirst=True
to pd.to_datetime
:
>>> maxdate=pd.to_datetime('11/09/2022')
# Timestamp('2022-11-09 00:00:00')
>>> df2 = df.loc[pd.to_datetime(df['DATE'], dayfirst=True) > maxdate]
Here is the execution for the sample data you have added to the question:
# Getting the max date from first dataframe
max_date=pd.to_datetime(df1['DATE'],dayfirst=True).max()
max_date
Timestamp('2022-09-11 00:00:00')
# Filtering second dataframe based on maximum date
df2[pd.to_datetime(df2['DATE'], dayfirst=True)>max_date]
Empty DataFrame
Columns: [DATE, TIME, X, Y, Z]
Index: []
# Result is empty dataframe for the sample data cause no record matches condition
# Records for maximum date:
df2[pd.to_datetime(df2['DATE'], dayfirst=True)==max_date]
DATE TIME X Y Z
180 11/09/22 17:13:00 734691.153 9551392.6276 9551392.6276
# Records for dates older than the maximum date:
df2[pd.to_datetime(df2['DATE'], dayfirst=True)<max_date]
DATE TIME X Y Z
134 23/08/22 16:55:00 734154.2872 9551189.3534 2845.237
135 23/08/22 16:55:00 734199.2516 9551070.6664 2842.9929
136 23/08/22 5:02:00 734669.613 9551361.8645 2845.0122
138 24/08/22 17:25:00 734215.991 9551068.2954 2842.1106
139 24/08/22 17:25:00 734684.8444 9551383.618 2846.0492
147 27/08/22 17:20:00 734214.1851 9551061.2423 2841.501
149 28/08/22 17:00:00 734669.613 9551361.8645 2845.0122
151 29/08/22 17:30:00 - - -
153 30/08/22 5:25:00 734176.3412 9551168.5498 2844.325