Find number of rows with values and latest date-CodePudding

I have dataframe like that:

pk_id date
123   2020-01-01
223   2020-01-02
123   2020-01-03
224   2020-01-04

and I want to find pk_id = 123 and pk_id = 223 with their latest date and count the amount of such rows.

I have the following code

idx = plan_df.groupby('pk_id')['date'].idxmax()
df = df.loc[idx] 
df = df.loc[df['pk_id'] == 123]

that forms dataframe

pk_id   date
123    2020-01-03
223    2020-01-02

and now I find the number of rows

num = df.shape[0]

I believe it can be done in one line. Any ideas?

CodePudding user response：

You can try

out = df[df['pk_id'].isin([123, 223])].groupby('pk_id', as_index=False)['date'].max()

print(out)

   pk_id        date
0    123  2020-01-03
1    223  2020-01-02

CodePudding user response：

You can use the pandas query function

df.query("pk_id == 123 | pk_id == 223").groupby('pk_id', as_index=False)['date']