I have two dataframes df1 and df2 based, respectively, on these dictionaries:
data1 = {'date': ['5/09/22', '7/09/22', '7/09/22','10/09/22'],
'second_column': ['first_value', 'second_value', 'third_value','fourth_value'],
'id_number':['AA576bdk89', 'GG6jabkhd589', 'BXV6jabd589','BXzadzd589'],
'fourth_column':['first_value', 'second_value', 'third_value','fourth_value'],}
data2 = {'date': ['5/09/22', '7/09/22', '7/09/22', '7/09/22', '7/09/22', '11/09/22'],
'second_column': ['first_value', 'second_value', 'third_value','fourth_value', 'fifth_value','sixth_value'],
'id_number':['AA576bdk89', 'GG6jabkhd589', 'BXV6jabd589','BXV6mkjdd589','GGdbkz589', 'BXhshhsd589'],
'fourth_column':['first_value', 'second_value', 'third_value','fourth_value', 'fifth_value','sixth_value'],}
I want to compare df2 with df1 in order to show the "id_number" of df2 that are in df1.
I also want to compare the two dataframes on the same date range.
For example the shared date range between df1 and df2 should be the from 5/09/22 to 10/09/22 (and not beyond)
How can I do this?
CodePudding user response:
You can define a helper function to make dataframes of your dictionaries and slice them on certain date range:
def format(dictionary, start, end):
"""Helper function.
Args:
dictionary: dictionary to format.
start: start date (DD/MM/YY).
end: end date (DD/MM/YY).
Returns:
Dataframe.
"""
return (
pd.DataFrame(dictionary)
.pipe(lambda df_: df_.assign(date=pd.to_datetime(df_["date"], format="%d/%m/%y")))
.pipe(
lambda df_: df_.loc[
(df_["date"] >= pd.to_datetime(start, format="%d/%m/%y"))
& (df_["date"] <= pd.to_datetime(end, format="%d/%m/%y")),
:,
]
).reset_index(drop=True)
)
Then, with dictionaries you provided, here is how you can "show the "id_number" of df2 that are in df1" for the desired date range:
df1 = format(data1, "05/09/22", "10/09/22")
df2 = format(data2, "05/09/22", "10/09/22")
print(df2[df2["id_number"].isin(df1["id_number"])]["id_number"])
# Output
0 AA576bdk89
1 GG6jabkhd589
2 BXV6jabd589
Name: id_number, dtype: object