I have two dataframes, with the same date
field, but different other fields. I need to add a column pneumonia_ARVI
from dataframe pneumonia_ARVI
to dataframe Result_data
.
They initially differ in the number of dates, in Result_data
dataframe there are significantly more dates than in pneumonia_ARVI
I need a concatenation with a date match, but if the records in the dataframe pneumonia_ARVI
than in the dataframe Result_data
, then the preference would have the dates specified in the dataset Result_data
. And the data that is missing in the dataset pneumonia_ARVI
replaced with empty values.
I have tried doing
Result_data = Result_data.set_index('date')
pneumonia_ARVI = pneumonia_ARVI.set_index('date')
End = pd.merge(Result_data, pneumonia_ARVI, left_index=True, right_index=True)
But this led to the fact that the data was adjusted to each other, and the field infected_city
do not leave all their original values by date.
How to combine this data correctly so that there are no problems with reducing the total number of dates?
CodePudding user response:
Use join
:
#convert to datetime if needed
Result_data["date"] = pd.to_datetime(Result_data["date"])
pneumonia_ARVI["date"] = pd.to_datetime(pneumonia_ARVI["date"])
#set index as you have done
Result_data = Result_data.set_index('date')
pneumonia_ARVI = pneumonia_ARVI.set_index('date')
#perform a left join
End = Result_data.join(pneumonia_ARVI, how="left")