How to Merge two datasets with different indexes but one common ID factor?-CodePudding

I am working with two distinct datasets: one regarding COVID-19 statistics and one with demographic characteristics of a city.

The covid19 one, namely covid.df looks as follows:

Note: Date, City ID, City, and State are all indexes

Date	City ID	City	State	Population mean	Population_2019 mean	Confirmed_rate_100k mean	Confirmed_rate_100k std	death_rate mean	death_rate std	new_confirmed	new_deaths
2020-02	120385	Los Angeles	CA	9559699	45959669	0.653	0.556	0.6	0.01	33	5
2020-02	120054	Houtson	Texas	3304040	3343560	0.543	0.043	22.34	1.6	60	9
...	...	....	...	...	...	...	...	...	...	...	...
2022-05	120385	Los Angeles	CA	9559483	45966549	0.672	0.032	2.3	0.5	22	12

The one with demographic information, demo.df includes the following

City ID	HDI	Education	Mobility	Poverty
120385	0.54	72.5	55.522	33.21
120054	0.33	66.2	76.433	12.504

I would like to include the information from demo.df on covid.df, however, given the different indexes for the two datasets, the concat() function has been giving me a hard time.

How can I merge the two such datasets such that covid.df looks like this:

Date	City ID	City	State	HDI	Education	Mobility	Poverty	Population mean	Population_2019 mean	Confirmed_rate_100k mean	Confirmed_rate_100k std	death_rate mean	death_rate std	new_confirmed	new_deaths
2020-02	120385	Los Angeles	CA	0.54	72.5	55.522	33.21	9559699	45959669	0.653	0.556	0.6	0.01	33	5
2020-02	120054	Houston	TX	0.33	66.2	76.433	12.504	3304040	3343560	0.543	0.043	22.34	1.6	60	9
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
2022-05	120385	Los Angeles	CA	0.54	72.5	55.522	33.21	9559483	45966549	0.672	0.032	2.3	0.5	22	12

Thank you!

CodePudding user response：

You only need this:

covid = covid.merge(demo, how='left', on='City ID')

For example, suppose we have this input (notice the different indexes of 88, 99 and 'fish', 'fowl'):

covid.df:
       Date  City ID          City    State  Population mean  Population_2019 mean  Confirmed_rate_100k mean  Confirmed_rate_100k std  death_rate mean  death_rate std  new_confirmed  new_deaths
88  2020-02   120385  Los Angeles       CA           9559699              45959669                     0.653                    0.556             0.60            0.01             33           5
99  2020-02   120054      Houtson    Texas           3304040               3343560                     0.543                    0.043            22.34            1.60             60           9
demo.df:
      City ID   HDI  Education  Mobility  Poverty
fish   120385  0.54       72.5    55.522   33.210
fowl   120054  0.33       66.2    76.433   12.50

The output will be

      Date  City ID          City    State  Population mean  Population_2019 mean  Confirmed_rate_100k mean  ...  death_rate std  new_confirmed  new_deaths   HDI  Education  Mobility  Poverty
0  2020-02   120385  Los Angeles       CA           9559699              45959669                     0.653  ...            0.01             33           5  0.54       72.5    55.522   33.210
1  2020-02   120054      Houtson    Texas           3304040               3343560                     0.543  ...            1.60             60           9  0.33       66.2    76.433   12.504

[2 rows x 16 columns]