I have a large dataframe (extract below) and want to create a new dataframe containing the last "In-progress" row and the 3rd last "In-progress" row based on the Time for each ID.
I am new to Pandas and can't work out how to do it. Any help would be appreciated.
Dataframe:
Time | State | ID | Ref | Name |
---|---|---|---|---|
10:00 | In-progress | 54887 | 1 | Jim |
10:00 | In-progress | 54887 | 2 | Jon |
10:00 | In-progress | 54887 | 3 | Rob |
10:00 | In-progress | 54887 | 4 | Sam |
11:00 | In-progress | 54887 | 1 | Jim |
11:00 | In-progress | 54887 | 2 | Jon |
11:00 | In-progress | 54887 | 3 | Rob |
11:00 | In-progress | 54887 | 4 | Sam |
12:00 | In-progress | 54887 | 1 | Jim |
12:00 | In-progress | 54887 | 2 | Jon |
12:00 | In-progress | 54887 | 3 | Rob |
12:00 | In-progress | 54887 | 4 | Sam |
13:00 | Done | 54887 | 1 | Jim |
13:00 | Done | 54887 | 2 | Jon |
13:00 | Done | 54887 | 3 | Rob |
13:00 | Done | 54887 | 4 | Sam |
10:00 | In-progress | 65228 | a | Anya |
10:00 | In-progress | 65228 | b | Lot |
10:00 | In-progress | 65228 | c | Ted |
10:00 | In-progress | 65228 | d | Tom |
11:00 | In-progress | 65228 | a | Anya |
11:00 | In-progress | 65228 | b | Lot |
11:00 | In-progress | 65228 | c | Ted |
11:00 | In-progress | 65228 | d | Tom |
12:00 | In-progress | 65228 | a | Anya |
12:00 | In-progress | 65228 | b | Lot |
12:00 | In-progress | 65228 | c | Ted |
12:00 | In-progress | 65228 | d | Tom |
13:00 | Done | 65228 | a | Anya |
13:00 | Done | 65228 | b | Lot |
13:00 | Done | 65228 | c | Ted |
13:00 | Done | 65228 | d | Tom |
Desired Result:
Time | State | ID | Ref | Name |
---|---|---|---|---|
10:00 | In-progress | 54887 | 1 | Jim |
10:00 | In-progress | 54887 | 2 | Jon |
10:00 | In-progress | 54887 | 3 | Rob |
10:00 | In-progress | 54887 | 4 | Sam |
12:00 | In-progress | 54887 | 1 | Jim |
12:00 | In-progress | 54887 | 2 | Jon |
12:00 | In-progress | 54887 | 3 | Rob |
12:00 | In-progress | 54887 | 4 | Sam |
10:00 | In-progress | 65228 | a | Anya |
10:00 | In-progress | 65228 | b | Lot |
10:00 | In-progress | 65228 | c | Ted |
10:00 | In-progress | 65228 | d | Tom |
12:00 | In-progress | 65228 | a | Anya |
12:00 | In-progress | 65228 | b | Lot |
12:00 | In-progress | 65228 | c | Ted |
12:00 | In-progress | 65228 | d | Tom |
CodePudding user response:
3rd last to last
Use groupby.tail
:
out = (df[df['State'].eq('In-progress')]
.groupby(['Time', 'ID']).tail(3)
)
Output:
Time State ID Ref Name
1 10:00 In-progress 54887 2 Jon
2 10:00 In-progress 54887 3 Rob
3 10:00 In-progress 54887 4 Sam
5 11:00 In-progress 54887 2 Jon
6 11:00 In-progress 54887 3 Rob
7 11:00 In-progress 54887 4 Sam
9 12:00 In-progress 54887 2 Jon
10 12:00 In-progress 54887 3 Rob
11 12:00 In-progress 54887 4 Sam
17 10:00 In-progress 65228 b Lot
18 10:00 In-progress 65228 c Ted
19 10:00 In-progress 65228 d Tom
21 11:00 In-progress 65228 b Lot
22 11:00 In-progress 65228 c Ted
23 11:00 In-progress 65228 d Tom
25 12:00 In-progress 65228 b Lot
26 12:00 In-progress 65228 c Ted
27 12:00 In-progress 65228 d Tom
last and 3rd last (excluding 2nd last)
Use groupby.cumcount
:
idx = (df[df['State'].eq('In-progress')]
.groupby(['Time', 'ID']).cumcount(ascending=False)
.isin([0,2]).loc[lambda x: x]
.index
)
out = df.loc[idx]
Output:
Time State ID Ref Name
1 10:00 In-progress 54887 2 Jon
3 10:00 In-progress 54887 4 Sam
5 11:00 In-progress 54887 2 Jon
7 11:00 In-progress 54887 4 Sam
9 12:00 In-progress 54887 2 Jon
11 12:00 In-progress 54887 4 Sam
17 10:00 In-progress 65228 b Lot
19 10:00 In-progress 65228 d Tom
21 11:00 In-progress 65228 b Lot
23 11:00 In-progress 65228 d Tom
25 12:00 In-progress 65228 b Lot
27 12:00 In-progress 65228 d Tom
CodePudding user response:
get last and last-3 time per id
df1 = (df[df['State'].eq('In-progress')]
.drop_duplicates(['ID', 'Time'])
.groupby('ID')['Time'].nth([-3, -1]).reset_index())
df1
ID Time
0 54887 10:00
1 54887 12:00
2 65228 10:00
3 65228 12:00
filtering df
by merge
df1.merge(df, how='left').reindex(columns=df.columns)
result:
Time State ID Ref Name
0 10:00 In-progress 54887 1 Jim
1 10:00 In-progress 54887 2 Jon
2 10:00 In-progress 54887 3 Rob
3 10:00 In-progress 54887 4 Sam
4 12:00 In-progress 54887 1 Jim
5 12:00 In-progress 54887 2 Jon
6 12:00 In-progress 54887 3 Rob
7 12:00 In-progress 54887 4 Sam
8 10:00 In-progress 65228 a Anya
9 10:00 In-progress 65228 b Lot
10 10:00 In-progress 65228 c Ted
11 10:00 In-progress 65228 d Tom
12 12:00 In-progress 65228 a Anya
13 12:00 In-progress 65228 b Lot
14 12:00 In-progress 65228 c Ted
15 12:00 In-progress 65228 d Tom