Home > Blockchain >  Search pandas dataframe and edit values
Search pandas dataframe and edit values

Time:04-11

I want to replace missing country with the country that corresponds to the city, i.e. find another data point with the same city and copy the country, if there is no other record with the same city then remove.

Here's the dataframe:

   id         city                lat         lng       country

1036323110  Katherine            -14.4667    132.2667     Australia

1840015979  South Pasadena        27.7526    -82.7394     

1124755118  Beaconsfield          45.4333     -73.8667    Canada

1250921305  Ferney-Voltaire       46.2558      6.1081     France

1156346497  Jiangshan             28.7412      118.6225   

1231393325  Dīla                  6.4104        38.3100   Ethiopia

1840015979  South Pasadena        27.7526    -82.7394     United States

1192391794  Kigoma                21.1072       -76.1367   

1840054954  Hampstead             42.8821       -71.1709   United States

1840005111  West Islip            40.7097        -73.2971  United States

1076327352  Paulínia              -22.7611        -47.1542  Brazil

1250921305  Ferney-Voltaire       46.2558      6.1081     

1250921305  Ferney-Voltaire       46.2558      6.1081     France

1156346497  Jiangshan             28.7412      118.6225   China

1231393325  Dīla                  6.4104        38.3100   Ethiopia

1192391794  Gibara                21.1072       -76.1367   Cuba

1840054954  Dodoma             42.8821       -71.1709   

1840005111  West Islip            40.7097        -73.2971  United States

Here's my code so far:

df[df.isin(['city'])].stack()

CodePudding user response:

You can group by city, lat, lng three columns and filling missing values by first not nan value in each group.

df['country'] = df['country'].fillna(
    df.groupby(['city', 'lat', 'lng'])['country'].transform(
        lambda x: x.loc[x.first_valid_index()] if x.first_valid_index() else x
    )
)
print(df)

            id             city      lat       lng        country
0   1036323110        Katherine -14.4667  132.2667      Australia
1   1840015979   South Pasadena  27.7526  -82.7394  United States
2   1124755118     Beaconsfield  45.4333  -73.8667         Canada
3   1250921305  Ferney-Voltaire  46.2558    6.1081         France
4   1156346497        Jiangshan  28.7412  118.6225          China
5   1231393325             Dīla   6.4104   38.3100       Ethiopia
6   1840015979   South Pasadena  27.7526  -82.7394  United States
7   1192391794           Kigoma  21.1072  -76.1367            NaN
8   1840054954        Hampstead  42.8821  -71.1709  United States
9   1840005111       West Islip  40.7097  -73.2971  United States
10  1076327352         Paulínia -22.7611  -47.1542         Brazil
11  1250921305  Ferney-Voltaire  46.2558    6.1081         France
12  1250921305  Ferney-Voltaire  46.2558    6.1081         France
13  1156346497        Jiangshan  28.7412  118.6225          China
14  1231393325             Dīla   6.4104   38.3100       Ethiopia
15  1192391794           Gibara  21.1072  -76.1367           Cuba
16  1840054954           Dodoma  42.8821  -71.1709            NaN
17  1840005111       West Islip  40.7097  -73.2971  United States

CodePudding user response:

I've solved such a Problem with the geopy package. Then you can use the lat and long. Then filter the Geopy-Output for the Country. This Way you will avoid NaN's and always get an answer based on geo information.

  • Related