I want to search for every NA value in nancap dataframe and if there's an NA value replace it with median engine_capacity in cap dataframe(only if it's the same car_model), I tried doing the following code but it didn't work. (sorry if my question is not clear)
url = 'https://raw.githubusercontent.com/YousefAlotaibi/saudi_used_cars_price_prediciton/main/data/cars_cleaned_data.csv'
df = pd.read_csv(url)
df.head()
cap = df.groupby('car_model')['engine_capacity'].median().reset_index()
nancap = df[['engine_capacity', 'car_model']]
for i, z in nancap.itertuples(index=False):
if i.is_integer() == False: # if NA
for c, ca in cap.itertuples(index=False):
if c == z: # if car_model in c of cap == car_model of z in cap
i = ca # assign median engine capacity which is ca to i
CodePudding user response:
try:
nancap = df[['engine_capacity', 'car_model']]
nancap = (
nancap
.set_index('car_model')
.fillna(
nancap
.groupby('car_model')
.agg(pd.Series.median)
.to_dict()
)
.reset_index()
)
But take into account that there are lots of models with all engine_capacity values as NaN
and their median will then NaN
. If you want to fill those residual NaN
you can add a .fillna('No data available')
after .reset_index()
.
Like:
nancap = df[['engine_capacity', 'car_model']]
nancap = (
nancap
.set_index('car_model')
.fillna(
nancap
.groupby('car_model')
.agg(pd.Series.median)
.to_dict()
)
.reset_index()
.fillna('No data available')
)