ISSUE
I am performing data cleansing. I have calculated a column mean based on conditions fed into the .loc()
function. Storing this output in the z variable is producing a (1,1) dataframe and throwing an incompatibility error when I try to assign it to a missing value.
WHAT'S BEEN TRIED
Input
a = train[['LotFrontage', 'MSSubClass', 'MSZoning', 'Street',
'LotShape']]
z = train.loc[((train.MSSubClass == 190) &
(train.MSZoning == 'RL') &
(train.LotShape == 'IR1'))]
.agg({'LotFrontage': ['mean']})
a.LotFrontage[335] = z
Output
ValueError: Incompatible indexer with DataFrame
QUESTIONS
- Is it possible to store the
.mean()
output as an integer in z to fix this issue? - If above is not possible, is there a different method I should be using to replace the missing
LotFrontage
value with the calculated mean?
CodePudding user response:
You can use
z = (train.loc[((train.MSSubClass == 190) &
(train.MSZoning == 'RL') &
(train.LotShape == 'IR1'))]
.agg({'LotFrontage': ['mean']})
.item()) # Return first element of Series
# or
z = (train.loc[((train.MSSubClass == 190) &
(train.MSZoning == 'RL') &
(train.LotShape == 'IR1'))]
['LotFrontage'].mean())
# Depending on what is missing value in `LotFrontage` column
# if it is empty string, you can use `.eq('')`
# if it is NaN value, you can use `.isna()`
m = a['LotFrontage'].isna()
a.loc[m, 'LotFrontage'] = z
CodePudding user response:
If I understand what you're trying to do correctly... this may work.
z = (train.loc[train.MSSubClass.eq(190)
& train.MSZoning.eq('RL')
& train.LotShape.eq('IR1'), 'LotFrontage']
.mean()) # Returns a float.
a.loc[335, 'LotFrontage'] = z
# Or, for all nans in LotFrontage:
a.LotFrontage = a.LotFrontage.fillna(z)