I want to combine two dataset according to similar timestamps. Both datasets have different stamps format as follow:
DF1
d1 = {'col1': ["a1-20220517-183028.spc", "a2-20220518-183028.spc","a3-20220519-183028.spc","a4-20220525-183028.spc"],
'col2': [a, b ,c,d]}
df_1 = pd.DataFrame(data=d1)
DF2
d2 = {'col1.2': ["18-05-2022 12:23","17-05-2022 0:16:00" ,"25-05-2022 20:23","19-05-2022 14:54","02-05-2022 12:14"],
'col2.2': [12, 2 ,3,10, 1]}
df_2 = pd.DataFrame(data=d2)
I want to combine df_1 and df_2 according to the timestamp of col1 in DF1 AND col1.2 in DF2, so the resulting output is given as follow:
d_output = {'col1': ["17-05-2022", "18-05-2022", "19-05-2022", "25-05-2022"],
'col2': [a, b ,c , d],
'col3': [2, 12 ,10, 3 ]}
df_output=pd.DataFrame(data=d)
I have tried extracting the timestamp of one and formatting it into the same format but I am really struggling to format it correctly and merge both datasets with a similar timestamp.
Anyway I could solve this?
Thanks a lot!
CodePudding user response:
This is a merge
with a bit of pre-processing using str.extract
and pd.to_datetime
:
out = (
df_1.assign(col1=pd.to_datetime(df_1['col1'].str.extract('-(\d{8})', expand=False))
.dt.strftime('%d-%m-%Y'))
.merge(
df_2.rename(columns={'col1.2': 'col1', 'col2.2': 'col3'})
.assign(col1=lambda d: pd.to_datetime(d['col1']).dt.strftime('%d-%m-%Y')),
how='left', on='col1'
)
)
Output:
col1 col2 col3
0 17-05-2022 a 2
1 18-05-2022 b 12
2 19-05-2022 c 10
3 25-05-2022 d 3
CodePudding user response:
You can use the pd.to_datetime
function to convert the timestamps in both df_1
and df_2
to a common format, then use the pd.merge
function to join the two dataframes on the common timestamp column.
For example like this:
df_1['timestamp'] = pd.to_datetime(df_1['col1'].str.extract(r'(\d{4}\d{2}\d{2}-\d{6})')[0], format='%Y%m%d-%H%M%S')
df_2['timestamp'] = pd.to_datetime(df_2['col1.2'], format='%d-%m-%Y %H:%M:%S')
df_output = pd.merge(df_1, df_2, on='timestamp')