Home > OS >  How do i df.fillna with category median values
How do i df.fillna with category median values

Time:06-04

I have a large dataset ~1mln rows, and about 5000 absent coordinates(i'd like to fill them with median value by category 'city'everything but fillna is working, how to make it happen?

city = ['London', 'Paris', 'Vienna', 'Milan','London', 'Paris', 'Vienna', 'Milan']
lat = [51.510843900000005, 48.8671391, 48.204465500000005, 45.4787357, 51.510843900000005, 48.8671391, None,  None]
lng = [-0.1424476, 2.328075, 16.3686397, 9.1961308, -0.14244, 2.329, None, None]

data = pd.DataFrame(list(zip(city, lat, lng)),columns =['city', 'lat', 'lng'])

display(data['lat'].isna().sum())  # 2
display(data['lng'].isna().sum())  # 2

for city_name in set(data['city']):
    data[data['city'] == city_name ]['lat'].fillna(data[data['city'] == city_name ]['lat'].median())
    data[data['city'] == city_name ]['lng'].fillna(data[data['city'] == city_name ]['lng'].median())
    print(city_name, data[data['city'] == city_name ]['lat'].median(),data[data['city'] == city_name ]['lng'].median())

display(data['lat'].isna().sum())  # 2
display(data['lng'].isna().sum())  # 2 

CodePudding user response:

You could do:

data.groupby("city").transform(lambda x: x.fillna(x.median()))

First groupby with the city, then use transform with fillna and calculate the median. (you could use any mathematical operation)

CodePudding user response:

You can do a fillna on the dataframe directly:

data.fillna(data.groupby("city").transform("median"))

     city        lat        lng
0  London  51.510844  -0.142448
1   Paris  48.867139   2.328075
2  Vienna  48.204466  16.368640
3   Milan  45.478736   9.196131
4  London  51.510844  -0.142440
5   Paris  48.867139   2.329000
6  Vienna  48.204466  16.368640
7   Milan  45.478736   9.196131
  • Related