Home > Blockchain >  A loop to fill null values in a column using the mode value is breaking. Still no working Solution
A loop to fill null values in a column using the mode value is breaking. Still no working Solution

Time:12-28

The following is a sample from my data frame:

import pandas as pd
import numpy as np

d=['SKODASUPERB','SKODASUPERB',\
   'SKODASUPERB','MERCEDES-BENZE CLASS','ASTON MARTINVIRAGE'\
       ,'ASTON MARTINVIRAGE','ASTON MARTINVIRAGE','TOYOTAHIACE',\
           'TOYOTAAVENSIS','TOYOTAAVENSIS','TOYOTAAVENSIS','TOYOTAAVENSIS',\
               'FERRARI360','FERRARILAFERRARI']
x=['SUV','SUV','nan','nan','SPORTS','SPORTS','SPORTS',\
   'nan','SEDAN','SEDAN','SEDAN','SEDAN','SPORT','SPORT'] 



df=pd.DataFrame({'make_model':d,'body':x})
df.body=df.body.replace('nan',np.NaN)
df.head()

Out[24]: 
             make_model    body
0           SKODASUPERB     SUV
1           SKODASUPERB     SUV
2           SKODASUPERB     NaN
3  MERCEDES-BENZE CLASS     NaN
4    ASTON MARTINVIRAGE  SPORTS

There are some null values in the 'body' column

df.body.isnull().sum()
Out[25]: 3

So i am trying to fill the null values in body column by using the mode of body type for a particular make_model. For instance, 2 observations of SKODASUPERB have body as 'SUV' and 1 observation has body as null. So the mode of body for SKODASUPERB would be 'SUV' and i want 'SUV to be filled in for the third observation too. For this i am using the following code

make_model_list=df.make_model.unique().tolist()

for x in make_model_list:
    df.loc[(df['make_model']==x)&(df['body'].isnull()),'body']=\
        df.loc[(df['make_model']==x)&(df['body'].isnull()),'body']\
            .fillna(df.loc[df['make_model']==x,'body'].mode())

Unfortunately, the loop is breaking as some observation dont have a mode value

df.body.isnull().sum()
Out[30]: 3

How can i force the loop to run even if there is no mode 'body' value for a particular make_model. I know that i can use continue command, but i am not sure how to write it.

CodePudding user response:

Assuming that make_model and body are distinct values:

donor = df.dropna().groupby(by=['make_model']).agg(pd.Series.mode).reset_index()
df = df.merge(donor, how='left', on=['make_model'])
df['body_x'].fillna(df.body_y, inplace=True)
df.drop(columns=['body_y'], inplace=True)
df.columns = ['make_model', 'body']
df
    make_model  body
0   SKODASUPERB SUV
1   SKODASUPERB SUV
2   SKODASUPERB SUV
3   MERCEDES-BENZE CLASS    NaN
4   ASTON MARTINVIRAGE  SPORTS
5   ASTON MARTINVIRAGE  SPORTS
6   ASTON MARTINVIRAGE  SPORTS
7   TOYOTAHIACE NaN
8   TOYOTAAVENSIS   SEDAN
9   TOYOTAAVENSIS   SEDAN
10  TOYOTAAVENSIS   SEDAN
11  TOYOTAAVENSIS   SEDAN
12  FERRARI360  SPORT
13  FERRARILAFERRARI    SPORT

CodePudding user response:

Finally, I have worked out a solution. It was just a matter of putting try and exception. This solution works perfectly for the purpose of my project and has filled 95% of the missing values. I have slightly changed the data to show that this method is effective:

d=['SKODASUPERB','SKODASUPERB',\
   'SKODASUPERB','MERCEDES-BENZE CLASS','ASTON MARTINVIRAGE'\
       ,'ASTON MARTINVIRAGE','ASTON MARTINVIRAGE','TOYOTAHIACE',\
           'TOYOTAAVENSIS','TOYOTAAVENSIS','TOYOTAAVENSIS','TOYOTAAVENSIS',\
               'FERRARI360','FERRARILAFERRARI']
x=['SUV','SUV','nan','nan','SPORTS','SPORTS','nan',\
   'nan','SEDAN','SEDAN','nan','SEDAN','SPORT','SPORT'] 
df=pd.DataFrame({'make_model':d,'body':x})
df.body=df.body.replace('nan',np.NaN)

df
Out[6]: 
              make_model    body
0            SKODASUPERB     SUV
1            SKODASUPERB     SUV
2            SKODASUPERB     NaN
3   MERCEDES-BENZE CLASS     NaN
4     ASTON MARTINVIRAGE  SPORTS
5     ASTON MARTINVIRAGE  SPORTS
6     ASTON MARTINVIRAGE     NaN
7            TOYOTAHIACE     NaN
8          TOYOTAAVENSIS   SEDAN
9          TOYOTAAVENSIS   SEDAN
10         TOYOTAAVENSIS     NaN
11         TOYOTAAVENSIS   SEDAN

df.body.isnull().sum()
Out[7]: 5

My Solution

for x in make_model_list:
    try:
        df.loc[(df['make_model']==x)&(df['body'].isnull()),'body']=\
df.loc[(df['make_model']==x)&(df['body'].isnull()),'body'].fillna\
(df.loc[df['make_model']==x,'body'].value_counts().index[0])
    except:
        pass
df.body.isnull().sum()
Out[9]: 2 #null values have dropped from 5 to 2.

Those 2 null values couldn't be filled because there was no frequent or mode value for them at all.

  • Related