I apologize for the potentially confusing title, but I will try to explain my situation as best I can.
Let's say I have a hypothetical Dataframe df
, which has an id
column and is arranged like...
time id x y
1.0 0 5 9
2.0 1 6 8
3.0 2 7 7
4.0 1 8 6
Now lets say I want only the data from rows with df[id] = 1
, but instead of dropping the other rows I fill it with NaN
like this...
time id x y
1.0 0 NaN NaN
2.0 1 6 8
3.0 2 NaN NaN
4.0 1 8 6
Note that I specifically want to keep the time
and id
columns, just change the values of x
and y
to NaN
for any rows that don't have the id
column set to 1
My first attempt was to use DataFrame.groupby()
, but this leads to any rows without the specific id
value being dropped entirely, which I don't want. My first instinct is to go into df
row by row, checking the id column, and changing the values to NaN
manually if id != 1
, but this seems like a very cumbersome and un-Pythonic way of doing this.
Any ideas? Thanks in advance!
CodePudding user response:
You can use simple selection with a boolean mask:
df.loc[df['id'].ne(1), ['x', 'y']] = float('nan')
Output:
time id x y
0 1.0 0 NaN NaN
1 2.0 1 6.0 8.0
2 3.0 2 NaN NaN
3 4.0 1 8.0 6.0
CodePudding user response:
from numpy import NAN
df.loc[df['id']!=1 , ['x','y']] = NAN