I have a dataframe df
that looks like this:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'name': ['foo','bar', '', 'blue', '', 'buzz', np.NaN ,'red', ''],
'key': [1, 2, 3, 4, 5, 6, 7, 8, 9]
})
df
color key
0 foo 1
1 bar 2
2 3
3 blue 4
4 5
5 buzz 6
6 NaN 7
7 red 8
8 9
I'd like to be able to set up a list and then set any values in the name
column not in that list to NaN (along with any values that are originally blank or NaN).
The desired df would look like this:
values_to_keep = ['blue', 'red']
df
color key
0 NaN 1
1 NaN 2
2 NaN 3
3 blue 4
4 NaN 5
5 NaN 6
6 NaN 7
7 red 8
8 NaN 9
How would I do this?
Thanks!
CodePudding user response:
You can use df.loc
to access a group of rows and columns by label(s) or a boolean array.
>>> import pandas as pd
>>> import numpy as np
>>>
>>> df = pd.DataFrame({
... 'name': ['foo','bar', '', 'blue', '', 'buzz', np.NaN ,'red', ''],
... 'key': [1, 2, 3, 4, 5, 6, 7, 8, 9]
... })
>>> df
name key
0 foo 1
1 bar 2
2 3
3 blue 4
4 5
5 buzz 6
6 NaN 7
7 red 8
8 9
>>> values_to_keep = ['blue', 'red']
>>> df.loc[~df.name.isin(values_to_keep), 'name'] = np.nan
>>> df
name key
0 NaN 1
1 NaN 2
2 NaN 3
3 blue 4
4 NaN 5
5 NaN 6
6 NaN 7
7 red 8
8 NaN 9