The following is a sample from my data frame:
import pandas as pd
import numpy as np
d=['SKODASUPERB','SKODASUPERB',\
'SKODASUPERB','MERCEDES-BENZE CLASS','ASTON MARTINVIRAGE'\
,'ASTON MARTINVIRAGE','ASTON MARTINVIRAGE','TOYOTAHIACE',\
'TOYOTAAVENSIS','TOYOTAAVENSIS','TOYOTAAVENSIS','TOYOTAAVENSIS',\
'FERRARI360','FERRARILAFERRARI']
x=['SUV','SUV','nan','nan','SPORTS','SPORTS','SPORTS',\
'nan','SEDAN','SEDAN','SEDAN','SEDAN','SPORT','SPORT']
df=pd.DataFrame({'make_model':d,'body':x})
df.body=df.body.replace('nan',np.NaN)
df.head()
Out[24]:
make_model body
0 SKODASUPERB SUV
1 SKODASUPERB SUV
2 SKODASUPERB NaN
3 MERCEDES-BENZE CLASS NaN
4 ASTON MARTINVIRAGE SPORTS
There are some null values in the 'body' column
df.body.isnull().sum()
Out[25]: 3
So i am trying to fill the null values in body column by using the mode of body type for a particular make_model. For instance, 2 observations of SKODASUPERB have body as 'SUV' and 1 observation has body as null. So the mode of body for SKODASUPERB would be 'SUV' and i want 'SUV to be filled in for the third observation too. For this i am using the following code
make_model_list=df.make_model.unique().tolist()
for x in make_model_list:
df.loc[(df['make_model']==x)&(df['body'].isnull()),'body']=\
df.loc[(df['make_model']==x)&(df['body'].isnull()),'body']\
.fillna(df.loc[df['make_model']==x,'body'].mode())
Unfortunately, the loop is breaking as some observation dont have a mode value
df.body.isnull().sum()
Out[30]: 3
How can i force the loop to run even if there is no mode 'body' value for a particular make_model. I know that i can use continue command, but i am not sure how to write it.
CodePudding user response:
Assuming that make_model and body are distinct values:
donor = df.dropna().groupby(by=['make_model']).agg(pd.Series.mode).reset_index()
df = df.merge(donor, how='left', on=['make_model'])
df['body_x'].fillna(df.body_y, inplace=True)
df.drop(columns=['body_y'], inplace=True)
df.columns = ['make_model', 'body']
df
make_model body
0 SKODASUPERB SUV
1 SKODASUPERB SUV
2 SKODASUPERB SUV
3 MERCEDES-BENZE CLASS NaN
4 ASTON MARTINVIRAGE SPORTS
5 ASTON MARTINVIRAGE SPORTS
6 ASTON MARTINVIRAGE SPORTS
7 TOYOTAHIACE NaN
8 TOYOTAAVENSIS SEDAN
9 TOYOTAAVENSIS SEDAN
10 TOYOTAAVENSIS SEDAN
11 TOYOTAAVENSIS SEDAN
12 FERRARI360 SPORT
13 FERRARILAFERRARI SPORT
CodePudding user response:
Finally, I have worked out a solution. It was just a matter of putting try and exception. This solution works perfectly for the purpose of my project and has filled 95% of the missing values. I have slightly changed the data to show that this method is effective:
d=['SKODASUPERB','SKODASUPERB',\
'SKODASUPERB','MERCEDES-BENZE CLASS','ASTON MARTINVIRAGE'\
,'ASTON MARTINVIRAGE','ASTON MARTINVIRAGE','TOYOTAHIACE',\
'TOYOTAAVENSIS','TOYOTAAVENSIS','TOYOTAAVENSIS','TOYOTAAVENSIS',\
'FERRARI360','FERRARILAFERRARI']
x=['SUV','SUV','nan','nan','SPORTS','SPORTS','nan',\
'nan','SEDAN','SEDAN','nan','SEDAN','SPORT','SPORT']
df=pd.DataFrame({'make_model':d,'body':x})
df.body=df.body.replace('nan',np.NaN)
df
Out[6]:
make_model body
0 SKODASUPERB SUV
1 SKODASUPERB SUV
2 SKODASUPERB NaN
3 MERCEDES-BENZE CLASS NaN
4 ASTON MARTINVIRAGE SPORTS
5 ASTON MARTINVIRAGE SPORTS
6 ASTON MARTINVIRAGE NaN
7 TOYOTAHIACE NaN
8 TOYOTAAVENSIS SEDAN
9 TOYOTAAVENSIS SEDAN
10 TOYOTAAVENSIS NaN
11 TOYOTAAVENSIS SEDAN
df.body.isnull().sum()
Out[7]: 5
My Solution
for x in make_model_list:
try:
df.loc[(df['make_model']==x)&(df['body'].isnull()),'body']=\
df.loc[(df['make_model']==x)&(df['body'].isnull()),'body'].fillna\
(df.loc[df['make_model']==x,'body'].value_counts().index[0])
except:
pass
df.body.isnull().sum()
Out[9]: 2 #null values have dropped from 5 to 2.
Those 2 null values couldn't be filled because there was no frequent or mode value for them at all.