I have a DataFrame that has missing values in one or more columns, and I am trying to lookup the missing values based on the values in two other columns for the same record (i.e., fill city_name
and city_id
columns based on the coordinates in lat
and long
columns.)
city_name | city_id | lat | long |
---|---|---|---|
NaN | NaN | -121.77 | 37.24 |
NaN | NaN | -122.77 | 38.24 |
NaN | NaN | -123.77 | 39.24 |
new york | c1 | -121.77 | 37.24 |
paris | c2 | -122.77 | 38.24 |
london | c3 | -123.77 | 39.24 |
How can I do this?
CodePudding user response:
Try with groupby
and fillna
:
df = df.fillna(df.groupby(["lat", "long"]).transform("first"))
>>> df
city_id city_name lat long
0 c1 new york -121.77 37.24
1 c2 paris -122.77 38.24
2 c3 london -123.77 39.24
3 c1 new york -121.77 37.24
4 c2 paris -122.77 38.24
5 c3 london -123.77 39.24