I have a data frame about the rent price in São Paulo, but some values of "Latitude" and "Longitude" are missing, so I wanna replace the "0" with the mean. The thing is that I wanna replace the Latitude and the Longitude with the mean just of the same District.
A slice of the dataFrame bellow.
Price | Condo | Size | Rooms | Toilets | Suites | Parking | Elevator | Furnished | Swimming Pool | New | District | Negotiation Type | Property Type | Latitude | Longitude | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 930 | 220 | 47 | 2 | 2 | 1 | 1 | 0 | 0 | 0 | 0 | Artur Alvim/São Paulo | rent | apartment | -23.543138 | -46.479486 |
1 | 1000 | 148 | 45 | 2 | 2 | 1 | 1 | 0 | 0 | 0 | 0 | Artur Alvim/São Paulo | rent | apartment | -23.550239 | -46.480718 |
2 | 1000 | 100 | 48 | 2 | 2 | 1 | 1 | 0 | 0 | 0 | 0 | Artur Alvim/São Paulo | rent | apartment | -23.542818 | -46.485665 |
3 | 1000 | 200 | 48 | 2 | 2 | 1 | 1 | 0 | 0 | 0 | 0 | Artur Alvim/São Paulo | rent | apartment | -23.547171 | -46.483014 |
4 | 1300 | 410 | 55 | 2 | 2 | 1 | 1 | 1 | 0 | 0 | 0 | Artur Alvim/São Paulo | rent | apartment | -23.525025 | -46.482436 |
How can I do it?
CodePudding user response:
Here is a very simple answer, below is the pseudo-code.
for i in range(len(pd.row)):
if pd[i][Latitude] == 0 and pd[i][Longitude] == 0:
//Do replace.
I am sorry that I forget the syntax of the pandas, but I think you can get the idea.
CodePudding user response:
first get the mean latitude and longitude for each district
df_meanll = df.groupby('District').agg(long_mean=('Longitude','mean'), lat_mean=('Latitude','mean')).reset_index()
replace the missing values from here, for example:
df = df.merge(df_meanll, on='District', how='left')
fill missing values as follows:
df.Longitude.fillna(df.long_mean, inplace=True)
df.Latitude.fillna(df.lat_mean, inplace=True)
CodePudding user response:
Using Pandas built in functions .groupby, .agg, .assign, .map, .apply
means_mapping = (
df
.groupby("District")
.agg(LongitudeMean=("Longitude", "mean"), LatitudeMean=("Latitude", "mean"))
.reset_index()
).set_index("District").transpose().to_dict("list")
df = df.assign(
Longitude=df["Longitude"].fillna(df["District"].map(means_mapping).apply(lambda x: x[1])),
Latitude=df["Latitude"].fillna(df["District"].map(means_mapping).apply(lambda x: x[0]))
)