I'm practicing on data preprocessing using dropna method
I simply defined csv_data as
csv_data = \
'''A, B, C, D
1.0, 2.0, 3.0, 4.0
5.0, 6.0,, 8.0
10.0, 11.0, 12.0,'''
df = pd.read_csv(StringIO(csv_data))
And I tried df.dropna(subset=['C'])
for dropping rows where NaN appear in 'C' column
But I got an error below.
df.dropna(subset=['C'])
Traceback (most recent call last):
Input In [50] in <cell line: 1>
df.dropna(subset=['C'])
File C:\Anaconda3\lib\site-packages\pandas\util\_decorators.py:311 in wrapper
return func(*args, **kwargs)
File C:\Anaconda3\lib\site-packages\pandas\core\frame.py:6002 in dropna
raise KeyError(np.array(subset)[check].tolist())
KeyError: ['C']
Anyone experienced this error?
CodePudding user response:
Seems like your columns name contains whitespace which needs to be striped before performing dropna
. So if you check your current column names you could see this,
>>> df.columns Index(['A', ' B', ' C', ' D'], dtype='object') ^^^
So one approach is to remove the spaces from column names.
df.columns = df.columns.str.strip()
Alternatively you can pass the exact column name(including spaces)
df.dropna(subset=[' C']) ^^^^