Home > Mobile >  Pandas: how to set column values to np.NaN if they don't match a list of strings
Pandas: how to set column values to np.NaN if they don't match a list of strings

Time:11-02

I have a dataframe df that looks like this:

import pandas as pd
import numpy as np

df = pd.DataFrame({
'name': ['foo','bar', '', 'blue', '', 'buzz', np.NaN ,'red', ''],
'key': [1, 2, 3, 4, 5, 6, 7, 8, 9]
})
df

    color   key
0   foo      1
1   bar      2
2            3
3   blue     4
4            5
5   buzz     6
6   NaN      7
7   red      8
8            9

I'd like to be able to set up a list and then set any values in the name column not in that list to NaN (along with any values that are originally blank or NaN).

The desired df would look like this:

values_to_keep = ['blue', 'red']

df
    color   key
0   NaN      1
1   NaN      2
2   NaN      3
3   blue     4
4   NaN      5
5   NaN      6
6   NaN      7
7   red      8
8   NaN      9

How would I do this?

Thanks!

CodePudding user response:

You can use df.loc to access a group of rows and columns by label(s) or a boolean array.

>>> import pandas as pd
>>> import numpy as np
>>>
>>> df = pd.DataFrame({
... 'name': ['foo','bar', '', 'blue', '', 'buzz', np.NaN ,'red', ''],
... 'key': [1, 2, 3, 4, 5, 6, 7, 8, 9]
... })
>>> df
   name  key
0   foo    1
1   bar    2
2          3
3  blue    4
4          5
5  buzz    6
6   NaN    7
7   red    8
8          9
>>> values_to_keep = ['blue', 'red']
>>> df.loc[~df.name.isin(values_to_keep), 'name'] = np.nan
>>> df
   name  key
0   NaN    1
1   NaN    2
2   NaN    3
3  blue    4
4   NaN    5
5   NaN    6
6   NaN    7
7   red    8
8   NaN    9
  • Related