I have a dataset, let's say:
emp_id | type | market_cap |
---|---|---|
1 | a | 7.845000e 10 |
2 | b | 6.235000e 10 |
3 | c | NaN |
I have the following class:
class DataCleaner:
def __init__(self, dataf):
"""this is the constructor that initializes the dataframe to be cleaned"""
self.dataf=dataf
def remove_upper_quantile(self, col, quantile_num):
self.dataf=self.dataf[self.dataf[col<self.dataf[col].quantile(quantile_num)]
return self.dataf
def remove_nulls(self, col):
self.dataf=self.dataf.dropna(subset=[col], inplace=True)
return self.dataf
When I call remove_nulls on my df, like so:
clean_company=DataCleaner(df)
df=clean_company.remove_nulls('market_cap')
I get the following: AttributeError: 'NoneType' object has no attribute 'dropna'.
This also happens when I don't assign df to the result.
What am I doing wrong here?
CodePudding user response:
- The base must be in the dataframe.
- To delete a column, use: df.pop('market_cap')
CodePudding user response:
You need to remove the inplace = True
keyword argument within this method call:
def remove_nulls(self, col):
self.dataf=self.dataf.dropna(subset=[col], inplace=True)
return self.dataf
Looking at the documentation for the df.dropna
method, you can see that when inplace=True
the method will return None
, rather than the dataframe.
You could, alternatively, just remove the self.dataf=
component of that line and just have self.dataf.dropna(subset=[col], inplace=True)
as that will drop the na
s "inplace" and change the dataframe without you needing to overwrite it.