Home > Blockchain >  find common data between two dataframes on a specific range of date
find common data between two dataframes on a specific range of date

Time:09-21

I have two dataframes df1 and df2 based, respectively, on these dictionaries:

data1 = {'date':  ['5/09/22', '7/09/22', '7/09/22','10/09/22'],
            'second_column': ['first_value', 'second_value', 'third_value','fourth_value'],
             'id_number':['AA576bdk89', 'GG6jabkhd589', 'BXV6jabd589','BXzadzd589'],
            'fourth_column':['first_value', 'second_value', 'third_value','fourth_value'],}
    
data2 = {'date':  ['5/09/22', '7/09/22', '7/09/22', '7/09/22', '7/09/22', '11/09/22'],
            'second_column': ['first_value', 'second_value', 'third_value','fourth_value', 'fifth_value','sixth_value'],
             'id_number':['AA576bdk89', 'GG6jabkhd589', 'BXV6jabd589','BXV6mkjdd589','GGdbkz589', 'BXhshhsd589'],
            'fourth_column':['first_value', 'second_value', 'third_value','fourth_value', 'fifth_value','sixth_value'],}

I want to compare df2 with df1 in order to show the "id_number" of df2 that are in df1.

I also want to compare the two dataframes on the same date range.

For example the shared date range between df1 and df2 should be the from 5/09/22 to 10/09/22 (and not beyond)

How can I do this?

CodePudding user response:

You can define a helper function to make dataframes of your dictionaries and slice them on certain date range:

def format(dictionary, start, end):
    """Helper function.

    Args:
        dictionary: dictionary to format.
        start: start date (DD/MM/YY).
        end: end date (DD/MM/YY).

    Returns:
        Dataframe.

    """
    return (
        pd.DataFrame(dictionary)
        .pipe(lambda df_: df_.assign(date=pd.to_datetime(df_["date"], format="%d/%m/%y")))
        .pipe(
            lambda df_: df_.loc[
                (df_["date"] >= pd.to_datetime(start, format="%d/%m/%y"))
                & (df_["date"] <= pd.to_datetime(end, format="%d/%m/%y")),
                :,
            ]
        ).reset_index(drop=True)
    )

Then, with dictionaries you provided, here is how you can "show the "id_number" of df2 that are in df1" for the desired date range:

df1 = format(data1, "05/09/22", "10/09/22")
df2 = format(data2, "05/09/22", "10/09/22")

print(df2[df2["id_number"].isin(df1["id_number"])]["id_number"])
# Output
0      AA576bdk89
1    GG6jabkhd589
2     BXV6jabd589
Name: id_number, dtype: object
  • Related