I have a df like so:
A B C
f s x
a b c
n
p l k
i
s j p
Now, I want to remove all the records that have the value in column A but are empty on the rest of df's columns, how can I achieve such a thing? The expected result would be a df like:
A B C
f s x
a b c
p l k
s j p
@EDIT: this solved the case for me:
columns = list(df.columns)[1:]
df = df[~df[columns].isnull().all(1)]
CodePudding user response:
Use DataFrame.replace
in order to set blanks to NaN, then you can remove rows with NaN with DataFrame.dropna
:
df.replace(r'^\s*$', np.nan, regex=True).dropna()
Or if you want remove only if column A is not NaN
df.replace(r'^\s*$', np.nan, regex=True)
.loc[lambda x: ~(x['A'].notna()
& x.filter(regex='!A').isna().all(axis=1))]
#morgan equivalent
#df.replace(r'^\s*$', np.nan, regex=True)
# .loc[lambda x: x['A'].isna()
# | x.filter(regex='!A').notna().any(axis=1)]
CodePudding user response:
If you have empty (NaN) cells in B/C a simple dropna
will work:
df.dropna()
Else, use boolean indexing:
df[df.ne('').all(1)]
output:
A B C
0 f s x
1 a b c
3 p l k
5 s j p
Note
Although I doubt you would want this, if you want to drop the rows only when A is not ''
, add a second condition:
df[df['A'].eq('')|df.ne('').all(1)]
example input:
df = pd.DataFrame({'A': ['x', '', '', 'x', '', 'x'],
'B': ['x', 'x', '', 'x', '', ''],
'C': ['x', 'x', 'x', '', '', '']})
A B C
0 x x x
1 x x
2 x
3 x x
4
5 x
output:
A B C
0 x x x
1 x x
2 x
4