Home > OS >  Taking care of missing data
Taking care of missing data

Time:10-12

I have a data frame about the rent price in São Paulo, but some values of "Latitude" and "Longitude" are missing, so I wanna replace the "0" with the mean. The thing is that I wanna replace the Latitude and the Longitude with the mean just of the same District.

A slice of the dataFrame bellow.

Price Condo Size Rooms Toilets Suites Parking Elevator Furnished Swimming Pool New District Negotiation Type Property Type Latitude Longitude
0 930 220 47 2 2 1 1 0 0 0 0 Artur Alvim/São Paulo rent apartment -23.543138 -46.479486
1 1000 148 45 2 2 1 1 0 0 0 0 Artur Alvim/São Paulo rent apartment -23.550239 -46.480718
2 1000 100 48 2 2 1 1 0 0 0 0 Artur Alvim/São Paulo rent apartment -23.542818 -46.485665
3 1000 200 48 2 2 1 1 0 0 0 0 Artur Alvim/São Paulo rent apartment -23.547171 -46.483014
4 1300 410 55 2 2 1 1 1 0 0 0 Artur Alvim/São Paulo rent apartment -23.525025 -46.482436

How can I do it?

CodePudding user response:

Here is a very simple answer, below is the pseudo-code.

for i in range(len(pd.row)):
     if pd[i][Latitude] == 0 and pd[i][Longitude] == 0:
        //Do replace.

I am sorry that I forget the syntax of the pandas, but I think you can get the idea.

CodePudding user response:

first get the mean latitude and longitude for each district

df_meanll = df.groupby('District').agg(long_mean=('Longitude','mean'), lat_mean=('Latitude','mean')).reset_index()

replace the missing values from here, for example:

df = df.merge(df_meanll, on='District', how='left')

fill missing values as follows:

df.Longitude.fillna(df.long_mean, inplace=True)
df.Latitude.fillna(df.lat_mean, inplace=True)

CodePudding user response:

Using Pandas built in functions .groupby, .agg, .assign, .map, .apply

means_mapping = (
    df
    .groupby("District")
    .agg(LongitudeMean=("Longitude", "mean"), LatitudeMean=("Latitude", "mean"))
    .reset_index()
).set_index("District").transpose().to_dict("list")

df = df.assign(
    Longitude=df["Longitude"].fillna(df["District"].map(means_mapping).apply(lambda x: x[1])),
    Latitude=df["Latitude"].fillna(df["District"].map(means_mapping).apply(lambda x: x[0]))
)
  • Related