I have two data frames:
df1 =
val1 val2
date
2020-01-01 42.353719 34.823934
2020-01-02 42.255172 34.683523
2020-01-03 46.049281 37.036076
2020-01-04 41.026087 36.340164
2020-01-05 32.300000 18.835484
df2 =
val3 val4
date
2020-01-01 01:00:00 01:00 4021.28 570.06
2020-01-01 02:00:00 01:00 4149.82 842.25
2020-01-01 03:00:00 01:00 4307.65 1638.57
2020-01-02 04:00:00 01:00 4428.95 2109.98
2020-01-02 05:00:00 01:00 4542.47 2365.25
2020-01-02 05:00:00 01:00 4462.47 2325.25
...
I would then like the val1
and val2
from df1
to be inserted for all rows with the same date (they don't have the same date format of course) in df2
, i.e.:
df_final =
val1 val2 val3 val4
date
2020-01-01 01:00:00 01:00 42.353719 34.823934 4021.28 570.06
2020-01-01 02:00:00 01:00 42.353719 34.823934 4149.82 842.25
2020-01-01 03:00:00 01:00 42.353719 34.823934 4307.65 1638.57
2020-01-02 04:00:00 01:00 42.255172 34.683523 4428.95 2109.98
2020-01-02 05:00:00 01:00 42.255172 34.683523 4542.47 2365.25
2020-01-02 05:00:00 01:00 42.255172 34.683523 4462.47 2325.25
How can this be done?
CodePudding user response:
I don't know if it is straightforward but here is a solution:
import pandas as pd
import datetime
df1.index = pd.to_datetime(df1.index)
df1["Date_merge"] = df1.index.date
df2.index = pd.to_datetime(df2.index)
df2["Date_merge"] = df2.index.date
df_final = (
df1.merge(df2, on="Date_merge")
.set_index(df2.index)
.rename_axis("date")
.drop("Date_merge", axis=1)
)
print(df_final)
val1 val2 val3 val4
date
2020-01-01 01:00:00 01:00 42.353719 34.823934 4021.28 570.06
2020-01-01 02:00:00 01:00 42.353719 34.823934 4149.82 842.25
2020-01-01 03:00:00 01:00 42.353719 34.823934 4307.65 1638.57
2020-01-02 04:00:00 01:00 42.255172 34.683523 4428.95 2109.98
2020-01-02 05:00:00 01:00 42.255172 34.683523 4462.47 2325.25
If someone wants to reconstruct the data, here df1
& df2
I made by the given tables:
df1 = pd.DataFrame(
{
"val1": {
"2020-01-01": 42.353719,
"2020-01-02": 42.255172,
"2020-01-03": 46.049281,
"2020-01-04": 41.026087,
"2020-01-05": 32.3,
},
"val2": {
"2020-01-01": 34.823934,
"2020-01-02": 34.683523,
"2020-01-03": 37.036076,
"2020-01-04": 36.340164,
"2020-01-05": 18.835484,
},
}
)
df2 = pd.DataFrame(
{
"val3": {
"2020-01-01 01:00:00 01:00": 4021.28,
"2020-01-01 02:00:00 01:00": 4149.82,
"2020-01-01 03:00:00 01:00": 4307.65,
"2020-01-02 04:00:00 01:00": 4428.95,
"2020-01-02 05:00:00 01:00": 4462.47,
},
"val4": {
"2020-01-01 01:00:00 01:00": 570.06,
"2020-01-01 02:00:00 01:00": 842.25,
"2020-01-01 03:00:00 01:00": 1638.57,
"2020-01-02 04:00:00 01:00": 2109.98,
"2020-01-02 05:00:00 01:00": 2325.25,
},
}
)
CodePudding user response:
Here is my suggestion. You create a new helper column called 'datem' which is the date without the time. Then use this column for the merge process.
See the script.
import pandas as pd
import io
df1="""date val1 val2
2020-01-01 42.353719 34.823934
2020-01-02 42.255172 34.683523
2020-01-03 46.049281 37.036076
2020-01-04 41.026087 36.340164
2020-01-05 32.300000 18.835484
"""
df2="""date val3 val4
2020-01-01 01:00:00 01:00 4021.28 570.06
2020-01-01 02:00:00 01:00 4149.82 842.25
2020-01-01 03:00:00 01:00 4307.65 1638.57
2020-01-02 04:00:00 01:00 4428.95 2109.98
2020-01-02 05:00:00 01:00 4542.47 2365.25
2020-01-02 05:00:00 01:00 4462.47 2325.25
"""
df1 = pd.read_csv(io.StringIO(df1), sep=" ",engine='python')
df1['date'] = pd.to_datetime(df1['date'], format='%Y-%m-%d')
df2 = pd.read_csv(io.StringIO(df2), sep=" ",engine='python')
df2['date'] = df2['date'].astype('datetime64[ns]')
df2['datem'] = df2['date'].dt.floor('d') ##helper column for the merge.
df1.merge(df2,left_on=['date'],right_on=['datem'])
Result is here:
date_x val1 val2 date_y val3 val4 datem
0 2020-01-01 42.353719 34.823934 2020-01-01 00:00:00 4021.28 570.06 2020-01-01
1 2020-01-01 42.353719 34.823934 2020-01-01 01:00:00 4149.82 842.25 2020-01-01
2 2020-01-01 42.353719 34.823934 2020-01-01 02:00:00 4307.65 1638.57 2020-01-01
3 2020-01-02 42.255172 34.683523 2020-01-02 03:00:00 4428.95 2109.98 2020-01-02
4 2020-01-02 42.255172 34.683523 2020-01-02 04:00:00 4542.47 2365.25 2020-01-02
5 2020-01-02 42.255172 34.683523 2020-01-02 04:00:00 4462.47 2325.25 2020-01-02
Let me know what you think.