Home > Software design >  Change values of a timedelta column based on the previous row
Change values of a timedelta column based on the previous row

Time:05-27

Let it be the following Python Panda Dataframe:

code visit_time flag other counter
0 NaT True X 3
0 1 days 03:00:12 False Y 1
0 NaT False X 3
0 0 days 05:00:00 True X 2
1 NaT False Z 3
1 NaT True X 3
1 1 days 03:00:12 False Y 1
2 NaT True X 3
2 5 days 10:01:12 True Y 0

To solve the problem, only the columns: code, visit_time and flag are needed.

Each row with a value of visit_time, has a previous row with value NaT. Knowing this, I want to do next modification in the dataframe:

  • Sets the flag of the row with non-null value of visit_time to the same value as its previous row.

Example:

code visit_time flag other counter
0 NaT True X 3
0 1 days 03:00:12 True Y 1
0 NaT False X 3
0 0 days 05:00:00 False X 2
1 NaT False Z 3
1 NaT True X 3
1 1 days 03:00:12 True Y 1
2 NaT True X 3
2 5 days 10:01:12 True Y 0

I am grateful for the help offered in advance.

CodePudding user response:

for row in df.iterrows():
    if row[0] < df.shape[0] - 1:  # stop comparing when getting to last row
        if df.at[row[0], 'visit_time'] == 'NaT' and df.at[row[0]   1, 'visit_time'] != 'NaT':
            df.at[row[0]   1, 'flag'] = df.at[row[0], 'flag']

Before:

   code       visit_time   flag other  counter
0     0              NaT   True     X        3
1     0  1 days 03:00:12  False     Y        1
2     0              NaT  False     X        3
3     0  0 days 05:00:00   True     X        2
4     1              NaT  False     Z        3
5     1              NaT   True     X        3
6     1  1 days 03:00:12  False     Y        1
7     2              NaT   True     X        3
8     2  5 days 10:01:12   True     Y        0

After:

   code       visit_time   flag other  counter
0     0              NaT   True     X        3
1     0  1 days 03:00:12   True     Y        1
2     0              NaT  False     X        3
3     0  0 days 05:00:00  False     X        2
4     1              NaT  False     Z        3
5     1              NaT   True     X        3
6     1  1 days 03:00:12   True     Y        1
7     2              NaT   True     X        3
8     2  5 days 10:01:12   True     Y        0

CodePudding user response:

You can simply have a shifted dataframe as in:

df_previous = df.copy()
df_previous.index =1

Looking like:

code visit_time flag other counter
1 0 NaT True X 3
2 0 1 days 03:00:12 False Y 1
3 0 NaT False X 3
4 0 0 days 05:00:00 True X 2
5 1 NaT False Z 3
6 1 NaT True X 3
7 1 1 days 03:00:12 False Y 1
8 2 NaT True X 3
9 2 5 days 10:01:12 True Y 0

Now you can have it merge with original dataframe and assign values with simple vector comparisons:

df = df.merge(df_previous[['visit_time', 'flag']], right_index=True, left_index=True, how='left', suffixes=["",'_previous'])
df.loc[df.visit_time.notna(), 'flag'] = df.loc[df.visit_time.notna(), 'flag_previous']

Now your dataframe looks like:

code visit_time flag other counter visit_time_previous flag_previous
0 0 NaT True X 3 nan nan
1 0 1 days 03:00:12 True Y 1 NaT 1
2 0 NaT False X 3 1 days 03:00:12 0
3 0 0 days 05:00:00 False X 2 NaT 0
4 1 NaT False Z 3 0 days 05:00:00 1
5 1 NaT True X 3 NaT 0
6 1 1 days 03:00:12 True Y 1 NaT 1
7 2 NaT True X 3 1 days 03:00:12 0
8 2 5 days 10:01:12 True Y 0 NaT 1

You can also remove previous columns if you like with:

df.drop(list(df.filter(regex = '_previous')), axis = 1)

Which will leave you with:

code visit_time flag other counter
0 0 NaT True X 3
1 0 1 days 03:00:12 True Y 1
2 0 NaT False X 3
3 0 0 days 05:00:00 False X 2
4 1 NaT False Z 3
5 1 NaT True X 3
6 1 1 days 03:00:12 True Y 1
7 2 NaT True X 3
8 2 5 days 10:01:12 True Y 0

CodePudding user response:

You can use .mask to set the 'flag' values to the .shifted version of itself where 'visit_time' values are notnull.

out = df.assign(
    flag=df['flag'].mask(df['visit_time'].notnull(), df['flag'].shift())
)

print(out)
   code      visit_time   flag other  counter
0     0             NaT   True     X        3
1     0 1 days 03:00:12   True     Y        1
2     0             NaT  False     X        3
3     0 0 days 05:00:00  False     X        2
4     1             NaT  False     Z        3
5     1             NaT   True     X        3
6     1 1 days 03:00:12   True     Y        1
7     2             NaT   True     X        3
8     2 5 days 10:01:12   True     Y        0
  • .mask(condition, other) replaces values where condition is True with the values of other in this case other is the value from the previous row.
  • .assign(…) is a way to update a column while returning a new DataFrame this can be replaced with column assignment df['flag'] = df['flag'].where(…) to modify the DataFrame in place.
  • Related