Given a DataFrame, where the "time" attribute is a Python timedelta object. The index attribute is internal to the pandas dataframe, I simply show it to make it easier to understand the problem.
index | ID | time | country |
---|---|---|---|
0 | 0 | NaT | ITL |
1 | 0 | 0 days 01:00:00 | USA |
2 | 0 | NaT | ITL |
3 | 0 | NaT | ESP |
4 | 0 | 2 days 00:05:00 | USA |
5 | 1 | NaT | ITL |
6 | 1 | 0 days 03:35:00 | USA |
7 | 1 | NaT | ESP |
8 | 2 | NaT | USA |
9 | 3 | NaT | FRA |
10 | 3 | 2 days 04:14:10 | ESP |
11 | 3 | 1 days 03:35:00 | ITL |
I divide the functionality in 2 steps:
- Select the rows whose value of the attribute "time" is different from NaT.
index | ID | time | country |
---|---|---|---|
1 | 0 | 0 days 01:00:00 | USA |
4 | 0 | 2 days 00:05:00 | USA |
6 | 1 | 0 days 03:35:00 | USA |
10 | 3 | 2 days 04:14:10 | ESP |
11 | 3 | 1 days 03:35:00 | ITL |
- Select for each row of the above subset, its previous index row as long as it belongs to the same ID.
index | ID | time | country |
---|---|---|---|
0 | 0 | NaT | ITL |
1 | 0 | 0 days 01:00:00 | USA |
3 | 0 | NaT | ESP |
4 | 0 | 2 days 00:05:00 | USA |
5 | 1 | NaT | ITL |
6 | 1 | 0 days 03:35:00 | USA |
9 | 3 | NaT | FRA |
10 | 3 | 2 days 04:14:10 | ESP |
11 | 3 | 1 days 03:35:00 | ITL |
The DataFrame I want is the one above (without the index column, which is internal to pandas). I have split the functionality into 2 steps to make the problem understandable, however I need to get to step 2, it is not necessary that the functionality of step 1 comes implicit.
Thanks for your help in advance.
CodePudding user response:
Chain masks for testing non missing values with shifted values per groups by DataFrameGroupBy.shift
with |
for bitwise OR
in boolean indexing
:
m = df['time'].notna()
df = df[m | m.groupby(df['ID']).shift(-1,fill_value=False)]
print (df)
ID time country
0 0 NaT ITL
1 0 0 days 01:00:00 USA
3 0 NaT ESP
4 0 2 days 00:05:00 USA
5 1 NaT ITL
6 1 0 days 03:35:00 USA
9 3 NaT FRA
10 3 2 days 04:14:10 ESP
11 3 1 days 03:35:00 ITL