how i select row in dataframe based on the last position for every user id. Is there any idea?
data=pd.DataFrame({'User_ID':['122','122','122','233','233','233','233','366','366','366'],'Age':[23,23,np.nan,24,24,24,24,21,21,np.nan]})
data
and the outcomes should be like this
data_new=pd.DataFrame({'User_ID':['122','233','366'],'Age':[np.nan,24,np.nan]})
so i just try to take the last row for every user_id. I'm totally beginner, is there any idea?
CodePudding user response:
As you want to keep the NaN, you can groupby.tail
(groupby.last
would drop the NaNs):
out = data.groupby('User_ID').tail(1)
Another option is to drop_duplicates
:
out = data.drop_duplicates(subset='User_ID', keep='last')
output:
User_ID Age
2 122 NaN
6 233 24.0
9 366 NaN
If you want to reset the index in the process use ignore_index=True
:
out = data.drop_duplicates(subset='User_ID', keep='last', ignore_index=True)
output:
User_ID Age
0 122 NaN
1 233 24.0
2 366 NaN
CodePudding user response:
data_new =data.drop_duplicates(subset='User_ID', keep='last')