Home > Mobile >  How to combine two dataframes by index (pandas)
How to combine two dataframes by index (pandas)

Time:11-05

I have two dataframes, with the same date field, but different other fields. I need to add a column pneumonia_ARVI from dataframe pneumonia_ARVI to dataframe Result_data.

They initially differ in the number of dates, in Result_data dataframe there are significantly more dates than in pneumonia_ARVI

enter image description here

I need a concatenation with a date match, but if the records in the dataframe pneumonia_ARVI than in the dataframe Result_data, then the preference would have the dates specified in the dataset Result_data. And the data that is missing in the dataset pneumonia_ARVI replaced with empty values.

I have tried doing

Result_data = Result_data.set_index('date')
pneumonia_ARVI = pneumonia_ARVI.set_index('date')
End = pd.merge(Result_data, pneumonia_ARVI, left_index=True, right_index=True)

But this led to the fact that the data was adjusted to each other, and the field infected_city do not leave all their original values by date.

How to combine this data correctly so that there are no problems with reducing the total number of dates?

CodePudding user response:

Use join:

#convert to datetime if needed
Result_data["date"] = pd.to_datetime(Result_data["date"])
pneumonia_ARVI["date"] = pd.to_datetime(pneumonia_ARVI["date"])

#set index as you have done
Result_data = Result_data.set_index('date')
pneumonia_ARVI = pneumonia_ARVI.set_index('date')

#perform a left join
End = Result_data.join(pneumonia_ARVI, how="left")
  • Related