Home > Software design >  How to merge dataframe with time range?
How to merge dataframe with time range?

Time:12-15

I am trying to map time category from the other time range dataframe. Here is the Time Range Dataframe. The time between "Start" and "End" is categorized into each "Weeks".

       Start        End  Weeks
0 2019-08-30 2019-10-03      1
1 2019-10-04 2019-10-31      2
2 2019-11-01 2019-11-28      3

Here is my time dataframe.

        Date
0 2019-09-06
1 2019-10-13
2 2019-10-12
3 2019-11-04

Here is the output I want. You can see that each date is categorized into the corresponding "Weeks" based on Time Range Dataframe.

        Date Weeks
0 2019-09-06     1
1 2019-10-13     2
2 2019-10-12     2
3 2019-11-04     3

My code is as follows.

import pandas as pd
# Date Range dataframe
Date_Range = {'Start':[datetime.datetime(2019, 8, 30, 0, 0),datetime.datetime(2019, 10, 4, 0, 0),datetime.datetime(2019, 11, 1, 0, 0)],
              'End':[datetime.datetime(2019, 10, 3, 0, 0),datetime.datetime(2019, 10, 31, 0, 0),datetime.datetime(2019, 11, 28, 0, 0)],
              'Weeks':[1,2,3],}
Date_Range_DF = pd.DataFrame(Date_Range)
print(Date_Range_DF)

# My dataframe
Date = {'Date':[datetime.datetime(2019, 9, 6, 0, 0),datetime.datetime(2019, 10, 13, 0, 0),datetime.datetime(2019, 10, 12, 0, 0),datetime.datetime(2019, 11, 4, 0, 0)]}
Date_DF = pd.DataFrame(Date)
print(Date_DF)


# Merge Dataframe
import numpy as np
import pandas as pd
df1 = Date_Range
df2 = Date_Range_DF
df2.start, df2.End = pd.to_datetime(df2.Start), pd.to_datetime(df2.End)
filtered = pd.merge_asof(df1, df2, left_on='Date', right_on='Start')

However, it keeps showing the erroe TypeError: Can only merge Series or DataFrame objects, a <class 'dict'> was passed

Please help me to find the bugs. Thank.

CodePudding user response:

I have looked over your code and it seems to me that you haven't defined "datetime", which is inside your Date_Range on line 3. That may be why your getting the error you are seeing. You may want to define this and it should be good. Unless I misunderstood what you wanted...

CodePudding user response:

To merge two pandas dataframes, you can use the merge function. This function will allow you to specify which columns from each dataframe to merge on, as well as how to handle any rows that don't have a match in the other dataframe.

Here is an example of how to use the merge function to merge two dataframes based on time range:

import pandas as pd

load the first dataframe

df1 = pd.read_csv('df1.csv')

load the second dataframe

df2 = pd.read_csv('df2.csv')

merge the two dataframes on the 'time' column

merged_df = df1.merge(df2, on='time')

print the merged dataframe

print(merged_df)

In this example, we assume that both dataframes have a column called time that contains the time range for each row. The merge function will merge the two dataframes by matching rows with the same time range.

You can also specify additional arguments to the merge function to customize how the dataframes are merged. For example, you can specify the how argument to control how the rows are combined (e.g. using an inner join, outer join, etc.), or the left_on and right_on arguments to specify different column names to merge on in each dataframe.

For more information on the merge function, you can refer to the official pandas documentation: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html

CodePudding user response:

Step1

make boundary list (bins)

bins = [Date_Range_DF.loc[0, 'Start'] - pd.Timedelta('1day')]   Date_Range_DF['End'].tolist()

bins

[Timestamp('2019-08-29 00:00:00'),
 Timestamp('2019-10-03 00:00:00'),
 Timestamp('2019-10-31 00:00:00'),
 Timestamp('2019-11-28 00:00:00')]

Step2

divide category by pd.cut(use boundary from bins)

s = pd.cut(pd.to_datetime(Date_DF['Date']), bins=bins, labels=df1['Weeks'].tolist())

s

0    1
1    2
2    2
3    3
Name: Date, dtype: category
Categories (3, int64): [1 < 2 < 3]

Step3

make s to Weeks column

Date_DF['Weeks'] = s

Date_DF

    Date        Weeks
0   2019-09-06  1
1   2019-10-13  2
2   2019-10-12  2
3   2019-11-04  3

if you don want that dtype of Weekis category, use following code for example make int

Date_DF['Weeks'] = s.astype('int')
  • Related