Home > Blockchain >  Concat dataframes with different date format
Concat dataframes with different date format

Time:04-26

I have two data frames:

df1 = 

             val1       val2
date
2020-01-01  42.353719   34.823934
2020-01-02  42.255172   34.683523
2020-01-03  46.049281   37.036076
2020-01-04  41.026087   36.340164
2020-01-05  32.300000   18.835484


df2 = 

                                val3       val4
date
2020-01-01 01:00:00 01:00       4021.28    570.06
2020-01-01 02:00:00 01:00       4149.82    842.25
2020-01-01 03:00:00 01:00       4307.65    1638.57
2020-01-02 04:00:00 01:00       4428.95    2109.98
2020-01-02 05:00:00 01:00       4542.47    2365.25
2020-01-02 05:00:00 01:00       4462.47    2325.25
...

I would then like the val1 and val2 from df1 to be inserted for all rows with the same date (they don't have the same date format of course) in df2, i.e.:

df_final = 

                                val1        val2        val3       val4
date
2020-01-01 01:00:00 01:00       42.353719   34.823934   4021.28    570.06
2020-01-01 02:00:00 01:00       42.353719   34.823934   4149.82    842.25
2020-01-01 03:00:00 01:00       42.353719   34.823934   4307.65    1638.57
2020-01-02 04:00:00 01:00       42.255172   34.683523   4428.95    2109.98
2020-01-02 05:00:00 01:00       42.255172   34.683523   4542.47    2365.25
2020-01-02 05:00:00 01:00       42.255172   34.683523   4462.47    2325.25

How can this be done?

CodePudding user response:

I don't know if it is straightforward but here is a solution:

import pandas as pd
import datetime

df1.index = pd.to_datetime(df1.index)
df1["Date_merge"] = df1.index.date

df2.index = pd.to_datetime(df2.index)
df2["Date_merge"] = df2.index.date

df_final = (
    df1.merge(df2, on="Date_merge")
    .set_index(df2.index)
    .rename_axis("date")
    .drop("Date_merge", axis=1)
)
print(df_final)

                                 val1        val2      val3       val4
date                
2020-01-01 01:00:00 01:00   42.353719   34.823934   4021.28     570.06
2020-01-01 02:00:00 01:00   42.353719   34.823934   4149.82     842.25
2020-01-01 03:00:00 01:00   42.353719   34.823934   4307.65     1638.57
2020-01-02 04:00:00 01:00   42.255172   34.683523   4428.95     2109.98
2020-01-02 05:00:00 01:00   42.255172   34.683523   4462.47     2325.25

If someone wants to reconstruct the data, here df1 & df2 I made by the given tables:

df1 = pd.DataFrame(
    {
        "val1": {
            "2020-01-01": 42.353719,
            "2020-01-02": 42.255172,
            "2020-01-03": 46.049281,
            "2020-01-04": 41.026087,
            "2020-01-05": 32.3,
        },
        "val2": {
            "2020-01-01": 34.823934,
            "2020-01-02": 34.683523,
            "2020-01-03": 37.036076,
            "2020-01-04": 36.340164,
            "2020-01-05": 18.835484,
        },
    }
)
df2 = pd.DataFrame(
    {
        "val3": {
            "2020-01-01 01:00:00 01:00": 4021.28,
            "2020-01-01 02:00:00 01:00": 4149.82,
            "2020-01-01 03:00:00 01:00": 4307.65,
            "2020-01-02 04:00:00 01:00": 4428.95,
            "2020-01-02 05:00:00 01:00": 4462.47,
        },
        "val4": {
            "2020-01-01 01:00:00 01:00": 570.06,
            "2020-01-01 02:00:00 01:00": 842.25,
            "2020-01-01 03:00:00 01:00": 1638.57,
            "2020-01-02 04:00:00 01:00": 2109.98,
            "2020-01-02 05:00:00 01:00": 2325.25,
        },
    }
)

CodePudding user response:

Here is my suggestion. You create a new helper column called 'datem' which is the date without the time. Then use this column for the merge process.

See the script.

import pandas as pd
import io   

df1="""date   val1   val2
2020-01-01   42.353719   34.823934
2020-01-02   42.255172   34.683523
2020-01-03   46.049281   37.036076
2020-01-04   41.026087   36.340164
2020-01-05   32.300000   18.835484
"""
df2="""date   val3   val4
2020-01-01 01:00:00 01:00   4021.28   570.06
2020-01-01 02:00:00 01:00   4149.82   842.25
2020-01-01 03:00:00 01:00   4307.65   1638.57
2020-01-02 04:00:00 01:00   4428.95   2109.98
2020-01-02 05:00:00 01:00   4542.47   2365.25
2020-01-02 05:00:00 01:00   4462.47   2325.25
"""

df1 = pd.read_csv(io.StringIO(df1), sep="   ",engine='python')
df1['date'] = pd.to_datetime(df1['date'], format='%Y-%m-%d')


df2 = pd.read_csv(io.StringIO(df2), sep="   ",engine='python')
df2['date'] = df2['date'].astype('datetime64[ns]')
df2['datem'] = df2['date'].dt.floor('d') ##helper column for the merge.
df1.merge(df2,left_on=['date'],right_on=['datem'])

Result is here:

date_x  val1    val2    date_y  val3    val4    datem
0   2020-01-01  42.353719   34.823934   2020-01-01 00:00:00 4021.28 570.06  2020-01-01
1   2020-01-01  42.353719   34.823934   2020-01-01 01:00:00 4149.82 842.25  2020-01-01
2   2020-01-01  42.353719   34.823934   2020-01-01 02:00:00 4307.65 1638.57 2020-01-01
3   2020-01-02  42.255172   34.683523   2020-01-02 03:00:00 4428.95 2109.98 2020-01-02
4   2020-01-02  42.255172   34.683523   2020-01-02 04:00:00 4542.47 2365.25 2020-01-02
5   2020-01-02  42.255172   34.683523   2020-01-02 04:00:00 4462.47 2325.25 2020-01-02

Let me know what you think.

  • Related