Home > Net >  Removing pandas rows based on existence of values in certain columns
Removing pandas rows based on existence of values in certain columns

Time:07-28

I have a df like so:

A B C
f s x
a b c
n
p l k
i
s j p

Now, I want to remove all the records that have the value in column A but are empty on the rest of df's columns, how can I achieve such a thing? The expected result would be a df like:

A B C
f s x
a b c
p l k
s j p

@EDIT: this solved the case for me:

columns = list(df.columns)[1:]
df = df[~df[columns].isnull().all(1)]

CodePudding user response:

Use DataFrame.replace in order to set blanks to NaN, then you can remove rows with NaN with DataFrame.dropna:

df.replace(r'^\s*$', np.nan, regex=True).dropna()

Or if you want remove only if column A is not NaN

df.replace(r'^\s*$', np.nan, regex=True)
  .loc[lambda x: ~(x['A'].notna() 
                   & x.filter(regex='!A').isna().all(axis=1))]
#morgan equivalent
#df.replace(r'^\s*$', np.nan, regex=True)
#  .loc[lambda x: x['A'].isna() 
#                 | x.filter(regex='!A').notna().any(axis=1)]

CodePudding user response:

If you have empty (NaN) cells in B/C a simple dropna will work:

df.dropna()

Else, use boolean indexing:

df[df.ne('').all(1)]

output:

   A  B  C
0  f  s  x
1  a  b  c
3  p  l  k
5  s  j  p

Note

Although I doubt you would want this, if you want to drop the rows only when A is not '', add a second condition:

df[df['A'].eq('')|df.ne('').all(1)]

example input:

df = pd.DataFrame({'A': ['x',  '',  '', 'x', '', 'x'],
                   'B': ['x', 'x',  '', 'x', '', ''],
                   'C': ['x', 'x', 'x',  '', '', '']})
   A  B  C
0  x  x  x
1     x  x
2        x
3  x  x   
4         
5  x      

output:

   A  B  C
0  x  x  x
1     x  x
2        x
4         
  • Related