Home > Software design >  Getting the Last and 3rd Last Records in a Dataframe Based on Criteria
Getting the Last and 3rd Last Records in a Dataframe Based on Criteria

Time:12-09

I have a large dataframe (extract below) and want to create a new dataframe containing the last "In-progress" row and the 3rd last "In-progress" row based on the Time for each ID.

I am new to Pandas and can't work out how to do it. Any help would be appreciated.

Dataframe:

Time State ID Ref Name
10:00 In-progress 54887 1 Jim
10:00 In-progress 54887 2 Jon
10:00 In-progress 54887 3 Rob
10:00 In-progress 54887 4 Sam
11:00 In-progress 54887 1 Jim
11:00 In-progress 54887 2 Jon
11:00 In-progress 54887 3 Rob
11:00 In-progress 54887 4 Sam
12:00 In-progress 54887 1 Jim
12:00 In-progress 54887 2 Jon
12:00 In-progress 54887 3 Rob
12:00 In-progress 54887 4 Sam
13:00 Done 54887 1 Jim
13:00 Done 54887 2 Jon
13:00 Done 54887 3 Rob
13:00 Done 54887 4 Sam
10:00 In-progress 65228 a Anya
10:00 In-progress 65228 b Lot
10:00 In-progress 65228 c Ted
10:00 In-progress 65228 d Tom
11:00 In-progress 65228 a Anya
11:00 In-progress 65228 b Lot
11:00 In-progress 65228 c Ted
11:00 In-progress 65228 d Tom
12:00 In-progress 65228 a Anya
12:00 In-progress 65228 b Lot
12:00 In-progress 65228 c Ted
12:00 In-progress 65228 d Tom
13:00 Done 65228 a Anya
13:00 Done 65228 b Lot
13:00 Done 65228 c Ted
13:00 Done 65228 d Tom

Desired Result:

Time State ID Ref Name
10:00 In-progress 54887 1 Jim
10:00 In-progress 54887 2 Jon
10:00 In-progress 54887 3 Rob
10:00 In-progress 54887 4 Sam
12:00 In-progress 54887 1 Jim
12:00 In-progress 54887 2 Jon
12:00 In-progress 54887 3 Rob
12:00 In-progress 54887 4 Sam
10:00 In-progress 65228 a Anya
10:00 In-progress 65228 b Lot
10:00 In-progress 65228 c Ted
10:00 In-progress 65228 d Tom
12:00 In-progress 65228 a Anya
12:00 In-progress 65228 b Lot
12:00 In-progress 65228 c Ted
12:00 In-progress 65228 d Tom

CodePudding user response:

3rd last to last

Use groupby.tail:

out = (df[df['State'].eq('In-progress')]
       .groupby(['Time', 'ID']).tail(3)
      )

Output:

     Time        State     ID Ref Name
1   10:00  In-progress  54887   2  Jon
2   10:00  In-progress  54887   3  Rob
3   10:00  In-progress  54887   4  Sam
5   11:00  In-progress  54887   2  Jon
6   11:00  In-progress  54887   3  Rob
7   11:00  In-progress  54887   4  Sam
9   12:00  In-progress  54887   2  Jon
10  12:00  In-progress  54887   3  Rob
11  12:00  In-progress  54887   4  Sam
17  10:00  In-progress  65228   b  Lot
18  10:00  In-progress  65228   c  Ted
19  10:00  In-progress  65228   d  Tom
21  11:00  In-progress  65228   b  Lot
22  11:00  In-progress  65228   c  Ted
23  11:00  In-progress  65228   d  Tom
25  12:00  In-progress  65228   b  Lot
26  12:00  In-progress  65228   c  Ted
27  12:00  In-progress  65228   d  Tom

last and 3rd last (excluding 2nd last)

Use groupby.cumcount:

idx = (df[df['State'].eq('In-progress')]
       .groupby(['Time', 'ID']).cumcount(ascending=False)
       .isin([0,2]).loc[lambda x: x]
       .index
      )

out = df.loc[idx]

Output:

     Time        State     ID Ref Name
1   10:00  In-progress  54887   2  Jon
3   10:00  In-progress  54887   4  Sam
5   11:00  In-progress  54887   2  Jon
7   11:00  In-progress  54887   4  Sam
9   12:00  In-progress  54887   2  Jon
11  12:00  In-progress  54887   4  Sam
17  10:00  In-progress  65228   b  Lot
19  10:00  In-progress  65228   d  Tom
21  11:00  In-progress  65228   b  Lot
23  11:00  In-progress  65228   d  Tom
25  12:00  In-progress  65228   b  Lot
27  12:00  In-progress  65228   d  Tom

CodePudding user response:

get last and last-3 time per id

df1 = (df[df['State'].eq('In-progress')]
       .drop_duplicates(['ID', 'Time'])
       .groupby('ID')['Time'].nth([-3, -1]).reset_index())

df1

    ID      Time
0   54887   10:00
1   54887   12:00
2   65228   10:00
3   65228   12:00

filtering df by merge

df1.merge(df, how='left').reindex(columns=df.columns)

result:

    Time    State       ID      Ref Name
0   10:00   In-progress 54887   1   Jim
1   10:00   In-progress 54887   2   Jon
2   10:00   In-progress 54887   3   Rob
3   10:00   In-progress 54887   4   Sam
4   12:00   In-progress 54887   1   Jim
5   12:00   In-progress 54887   2   Jon
6   12:00   In-progress 54887   3   Rob
7   12:00   In-progress 54887   4   Sam
8   10:00   In-progress 65228   a   Anya
9   10:00   In-progress 65228   b   Lot
10  10:00   In-progress 65228   c   Ted
11  10:00   In-progress 65228   d   Tom
12  12:00   In-progress 65228   a   Anya
13  12:00   In-progress 65228   b   Lot
14  12:00   In-progress 65228   c   Ted
15  12:00   In-progress 65228   d   Tom
  • Related