I am working with two distinct datasets: one regarding COVID-19 statistics and one with demographic characteristics of a city.
The covid19 one, namely covid.df
looks as follows:
Note: Date, City ID, City, and State are all indexes
Date | City ID | City | State | Population mean | Population_2019 mean | Confirmed_rate_100k mean | Confirmed_rate_100k std | death_rate mean | death_rate std | new_confirmed | new_deaths |
---|---|---|---|---|---|---|---|---|---|---|---|
2020-02 | 120385 | Los Angeles | CA | 9559699 | 45959669 | 0.653 | 0.556 | 0.6 | 0.01 | 33 | 5 |
2020-02 | 120054 | Houtson | Texas | 3304040 | 3343560 | 0.543 | 0.043 | 22.34 | 1.6 | 60 | 9 |
... | ... | .... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2022-05 | 120385 | Los Angeles | CA | 9559483 | 45966549 | 0.672 | 0.032 | 2.3 | 0.5 | 22 | 12 |
The one with demographic information, demo.df
includes the following
City ID | HDI | Education | Mobility | Poverty |
---|---|---|---|---|
120385 | 0.54 | 72.5 | 55.522 | 33.21 |
120054 | 0.33 | 66.2 | 76.433 | 12.504 |
I would like to include the information from demo.df
on covid.df
, however, given the different indexes for the two datasets, the concat()
function has been giving me a hard time.
How can I merge the two such datasets such that covid.df
looks like this:
Date | City ID | City | State | HDI | Education | Mobility | Poverty | Population mean | Population_2019 mean | Confirmed_rate_100k mean | Confirmed_rate_100k std | death_rate mean | death_rate std | new_confirmed | new_deaths |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2020-02 | 120385 | Los Angeles | CA | 0.54 | 72.5 | 55.522 | 33.21 | 9559699 | 45959669 | 0.653 | 0.556 | 0.6 | 0.01 | 33 | 5 |
2020-02 | 120054 | Houston | TX | 0.33 | 66.2 | 76.433 | 12.504 | 3304040 | 3343560 | 0.543 | 0.043 | 22.34 | 1.6 | 60 | 9 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2022-05 | 120385 | Los Angeles | CA | 0.54 | 72.5 | 55.522 | 33.21 | 9559483 | 45966549 | 0.672 | 0.032 | 2.3 | 0.5 | 22 | 12 |
Thank you!
CodePudding user response:
You only need this:
covid = covid.merge(demo, how='left', on='City ID')
For example, suppose we have this input (notice the different indexes of 88, 99
and 'fish', 'fowl'
):
covid.df:
Date City ID City State Population mean Population_2019 mean Confirmed_rate_100k mean Confirmed_rate_100k std death_rate mean death_rate std new_confirmed new_deaths
88 2020-02 120385 Los Angeles CA 9559699 45959669 0.653 0.556 0.60 0.01 33 5
99 2020-02 120054 Houtson Texas 3304040 3343560 0.543 0.043 22.34 1.60 60 9
demo.df:
City ID HDI Education Mobility Poverty
fish 120385 0.54 72.5 55.522 33.210
fowl 120054 0.33 66.2 76.433 12.50
The output will be
Date City ID City State Population mean Population_2019 mean Confirmed_rate_100k mean ... death_rate std new_confirmed new_deaths HDI Education Mobility Poverty
0 2020-02 120385 Los Angeles CA 9559699 45959669 0.653 ... 0.01 33 5 0.54 72.5 55.522 33.210
1 2020-02 120054 Houtson Texas 3304040 3343560 0.543 ... 1.60 60 9 0.33 66.2 76.433 12.504
[2 rows x 16 columns]