Select rows together with their previous rows given a condition for a pandas dataframe-CodePudding

Given a DataFrame, where the "time" attribute is a Python timedelta object. The index attribute is internal to the pandas dataframe, I simply show it to make it easier to understand the problem.

index	ID	time	country
0	0	NaT	ITL
1	0	0 days 01:00:00	USA
2	0	NaT	ITL
3	0	NaT	ESP
4	0	2 days 00:05:00	USA
5	1	NaT	ITL
6	1	0 days 03:35:00	USA
7	1	NaT	ESP
8	2	NaT	USA
9	3	NaT	FRA
10	3	2 days 04:14:10	ESP
11	3	1 days 03:35:00	ITL

I divide the functionality in 2 steps:

Select the rows whose value of the attribute "time" is different from NaT.

index	ID	time	country
1	0	0 days 01:00:00	USA
4	0	2 days 00:05:00	USA
6	1	0 days 03:35:00	USA
10	3	2 days 04:14:10	ESP
11	3	1 days 03:35:00	ITL

Select for each row of the above subset, its previous index row as long as it belongs to the same ID.

index	ID	time	country
0	0	NaT	ITL
1	0	0 days 01:00:00	USA
3	0	NaT	ESP
4	0	2 days 00:05:00	USA
5	1	NaT	ITL
6	1	0 days 03:35:00	USA
9	3	NaT	FRA
10	3	2 days 04:14:10	ESP
11	3	1 days 03:35:00	ITL

The DataFrame I want is the one above (without the index column, which is internal to pandas). I have split the functionality into 2 steps to make the problem understandable, however I need to get to step 2, it is not necessary that the functionality of step 1 comes implicit.

Thanks for your help in advance.

CodePudding user response：

Chain masks for testing non missing values with shifted values per groups by DataFrameGroupBy.shift with | for bitwise OR in boolean indexing:

m = df['time'].notna()
df = df[m | m.groupby(df['ID']).shift(-1,fill_value=False)]
print (df)
    ID            time country
0    0             NaT     ITL
1    0 0 days 01:00:00     USA
3    0             NaT     ESP
4    0 2 days 00:05:00     USA
5    1             NaT     ITL
6    1 0 days 03:35:00     USA
9    3             NaT     FRA
10   3 2 days 04:14:10     ESP
11   3 1 days 03:35:00     ITL