I have dataframe like that:
pk_id date
123 2020-01-01
223 2020-01-02
123 2020-01-03
224 2020-01-04
and I want to find pk_id = 123
and pk_id = 223
with their latest date and count the amount of such rows.
I have the following code
idx = plan_df.groupby('pk_id')['date'].idxmax()
df = df.loc[idx]
df = df.loc[df['pk_id'] == 123]
that forms dataframe
pk_id date
123 2020-01-03
223 2020-01-02
and now I find the number of rows
num = df.shape[0]
I believe it can be done in one line. Any ideas?
CodePudding user response:
You can try
out = df[df['pk_id'].isin([123, 223])].groupby('pk_id', as_index=False)['date'].max()
print(out)
pk_id date
0 123 2020-01-03
1 223 2020-01-02
CodePudding user response:
You can use the pandas query function
df.query("pk_id == 123 | pk_id == 223").groupby('pk_id', as_index=False)['date']