Home > Mobile >  How can I create a dictionary to map individual missing values with aggregate function in Pandas?
How can I create a dictionary to map individual missing values with aggregate function in Pandas?

Time:06-23

I'm not even sure how to describe this, but I am looking for a method to sum the values of others and replace the NaN with a particular value.

My data consists of Organization, Population and Square Miles. I have population and square miles data for each county in the state, however some Organizations span across multiple counties. I merged the two data sets (organization info with pop/square miles data) but am obviously left with NaNs for the organizations that span across multiple counties.

If I create a dictionary like the following:

counties = {'Job a': ['county a', 'county b', 'county c'],
            'Job b': ['county d', 'county e', 'county f'],
            'Job c': ['county g', 'county h']} etc.. 

If I have table 1 like this:

     county  sq_mile  population
0  County a     2500       15000
1  County b      750         400
2  County c     4000        3500
3  County d     4300        4500
4  County e     2000        1500
5  County f     1000        1500
6  County g     4300        3500
7  County h     4400        3200
8  County i     4000        3500

How can I take table 2 (which has just organizations) and fill missing data using the dictionary and associated column values?

  organization   sq_mile  population
0        job a      7250       18900
1        job b      7300        7500
2        job c      8700        6700
3     county i      4000        3500

CodePudding user response:

Try this:

counties_r = {i: k for k in counties.keys() for i in counties[k]}
df.groupby(df['County'].str.lower().map(counties_r).fillna(df['County']))\
  .sum().reset_index()

Output:

     County  Sq. Mile  Population
0  County i      4000        3500
1     Job a      7250       18900
2     Job b      7300        7500
3     Job c      8700        6700

Swap the keys and values in your dictionary the map that swapped dictionary to the county column making everything lowercase then use groupby with sum.

  • Related