take the last one id in dataframe using python-CodePudding

how i select row in dataframe based on the last position for every user id. Is there any idea?

data=pd.DataFrame({'User_ID':['122','122','122','233','233','233','233','366','366','366'],'Age':[23,23,np.nan,24,24,24,24,21,21,np.nan]})

data

and the outcomes should be like this

data_new=pd.DataFrame({'User_ID':['122','233','366'],'Age':[np.nan,24,np.nan]})

so i just try to take the last row for every user_id. I'm totally beginner, is there any idea?

CodePudding user response：

As you want to keep the NaN, you can groupby.tail (groupby.last would drop the NaNs):

out = data.groupby('User_ID').tail(1)

Another option is to drop_duplicates:

out = data.drop_duplicates(subset='User_ID', keep='last')

output:

  User_ID   Age
2     122   NaN
6     233  24.0
9     366   NaN

If you want to reset the index in the process use ignore_index=True:

out =  data.drop_duplicates(subset='User_ID', keep='last', ignore_index=True)

output:

  User_ID   Age
0     122   NaN
1     233  24.0
2     366   NaN

CodePudding user response：

data_new =data.drop_duplicates(subset='User_ID', keep='last')