How To Store Column Mean As a Variable-CodePudding

ISSUE

I am performing data cleansing. I have calculated a column mean based on conditions fed into the .loc() function. Storing this output in the z variable is producing a (1,1) dataframe and throwing an incompatibility error when I try to assign it to a missing value.

WHAT'S BEEN TRIED

Input

a = train[['LotFrontage', 'MSSubClass', 'MSZoning', 'Street', 
           'LotShape']]

z = train.loc[((train.MSSubClass == 190) & 
             (train.MSZoning == 'RL') & 
             (train.LotShape == 'IR1'))]
             .agg({'LotFrontage': ['mean']})

a.LotFrontage[335] = z

Output

ValueError: Incompatible indexer with DataFrame

QUESTIONS

Is it possible to store the .mean() output as an integer in z to fix this issue?
If above is not possible, is there a different method I should be using to replace the missing LotFrontage value with the calculated mean?

CodePudding user response：

You can use

z = (train.loc[((train.MSSubClass == 190) & 
               (train.MSZoning == 'RL') & 
               (train.LotShape == 'IR1'))]
     .agg({'LotFrontage': ['mean']})
     .item())  # Return first element of Series
# or
z = (train.loc[((train.MSSubClass == 190) & 
                (train.MSZoning == 'RL') & 
                (train.LotShape == 'IR1'))]
     ['LotFrontage'].mean())


# Depending on what is missing value in `LotFrontage` column
# if it is empty string, you can use `.eq('')`
# if it is NaN value, you can use `.isna()`
m = a['LotFrontage'].isna()
a.loc[m, 'LotFrontage'] = z

CodePudding user response：

If I understand what you're trying to do correctly... this may work.

z = (train.loc[train.MSSubClass.eq(190) 
               & train.MSZoning.eq('RL') 
               & train.LotShape.eq('IR1'), 'LotFrontage']
          .mean()) # Returns a float.

a.loc[335, 'LotFrontage'] = z

# Or, for all nans in LotFrontage:

a.LotFrontage = a.LotFrontage.fillna(z)