Let it be the following Python Panda Dataframe:
code | visit_time | flag | other | counter |
---|---|---|---|---|
0 | NaT | True | X | 3 |
0 | 1 days 03:00:12 | False | Y | 1 |
0 | NaT | False | X | 3 |
0 | 0 days 05:00:00 | True | X | 2 |
1 | NaT | False | Z | 3 |
1 | NaT | True | X | 3 |
1 | 1 days 03:00:12 | False | Y | 1 |
2 | NaT | True | X | 3 |
2 | 5 days 10:01:12 | True | Y | 0 |
To solve the problem, only the columns: code, visit_time
and flag
are needed.
Each row with a value of visit_time
, has a previous row with value NaT
. Knowing this, I want to do next modification in the dataframe:
- Sets the flag of the row with non-null value of
visit_time
to the same value as its previous row.
Example:
code | visit_time | flag | other | counter |
---|---|---|---|---|
0 | NaT | True | X | 3 |
0 | 1 days 03:00:12 | True | Y | 1 |
0 | NaT | False | X | 3 |
0 | 0 days 05:00:00 | False | X | 2 |
1 | NaT | False | Z | 3 |
1 | NaT | True | X | 3 |
1 | 1 days 03:00:12 | True | Y | 1 |
2 | NaT | True | X | 3 |
2 | 5 days 10:01:12 | True | Y | 0 |
I am grateful for the help offered in advance.
CodePudding user response:
for row in df.iterrows():
if row[0] < df.shape[0] - 1: # stop comparing when getting to last row
if df.at[row[0], 'visit_time'] == 'NaT' and df.at[row[0] 1, 'visit_time'] != 'NaT':
df.at[row[0] 1, 'flag'] = df.at[row[0], 'flag']
Before:
code visit_time flag other counter
0 0 NaT True X 3
1 0 1 days 03:00:12 False Y 1
2 0 NaT False X 3
3 0 0 days 05:00:00 True X 2
4 1 NaT False Z 3
5 1 NaT True X 3
6 1 1 days 03:00:12 False Y 1
7 2 NaT True X 3
8 2 5 days 10:01:12 True Y 0
After:
code visit_time flag other counter
0 0 NaT True X 3
1 0 1 days 03:00:12 True Y 1
2 0 NaT False X 3
3 0 0 days 05:00:00 False X 2
4 1 NaT False Z 3
5 1 NaT True X 3
6 1 1 days 03:00:12 True Y 1
7 2 NaT True X 3
8 2 5 days 10:01:12 True Y 0
CodePudding user response:
You can simply have a shifted dataframe as in:
df_previous = df.copy()
df_previous.index =1
Looking like:
code | visit_time | flag | other | counter | |
---|---|---|---|---|---|
1 | 0 | NaT | True | X | 3 |
2 | 0 | 1 days 03:00:12 | False | Y | 1 |
3 | 0 | NaT | False | X | 3 |
4 | 0 | 0 days 05:00:00 | True | X | 2 |
5 | 1 | NaT | False | Z | 3 |
6 | 1 | NaT | True | X | 3 |
7 | 1 | 1 days 03:00:12 | False | Y | 1 |
8 | 2 | NaT | True | X | 3 |
9 | 2 | 5 days 10:01:12 | True | Y | 0 |
Now you can have it merge with original dataframe and assign values with simple vector comparisons:
df = df.merge(df_previous[['visit_time', 'flag']], right_index=True, left_index=True, how='left', suffixes=["",'_previous'])
df.loc[df.visit_time.notna(), 'flag'] = df.loc[df.visit_time.notna(), 'flag_previous']
Now your dataframe looks like:
code | visit_time | flag | other | counter | visit_time_previous | flag_previous | |
---|---|---|---|---|---|---|---|
0 | 0 | NaT | True | X | 3 | nan | nan |
1 | 0 | 1 days 03:00:12 | True | Y | 1 | NaT | 1 |
2 | 0 | NaT | False | X | 3 | 1 days 03:00:12 | 0 |
3 | 0 | 0 days 05:00:00 | False | X | 2 | NaT | 0 |
4 | 1 | NaT | False | Z | 3 | 0 days 05:00:00 | 1 |
5 | 1 | NaT | True | X | 3 | NaT | 0 |
6 | 1 | 1 days 03:00:12 | True | Y | 1 | NaT | 1 |
7 | 2 | NaT | True | X | 3 | 1 days 03:00:12 | 0 |
8 | 2 | 5 days 10:01:12 | True | Y | 0 | NaT | 1 |
You can also remove previous columns if you like with:
df.drop(list(df.filter(regex = '_previous')), axis = 1)
Which will leave you with:
code | visit_time | flag | other | counter | |
---|---|---|---|---|---|
0 | 0 | NaT | True | X | 3 |
1 | 0 | 1 days 03:00:12 | True | Y | 1 |
2 | 0 | NaT | False | X | 3 |
3 | 0 | 0 days 05:00:00 | False | X | 2 |
4 | 1 | NaT | False | Z | 3 |
5 | 1 | NaT | True | X | 3 |
6 | 1 | 1 days 03:00:12 | True | Y | 1 |
7 | 2 | NaT | True | X | 3 |
8 | 2 | 5 days 10:01:12 | True | Y | 0 |
CodePudding user response:
You can use .mask
to set the 'flag'
values to the .shift
ed version of itself where 'visit_time'
values are notnull
.
out = df.assign(
flag=df['flag'].mask(df['visit_time'].notnull(), df['flag'].shift())
)
print(out)
code visit_time flag other counter
0 0 NaT True X 3
1 0 1 days 03:00:12 True Y 1
2 0 NaT False X 3
3 0 0 days 05:00:00 False X 2
4 1 NaT False Z 3
5 1 NaT True X 3
6 1 1 days 03:00:12 True Y 1
7 2 NaT True X 3
8 2 5 days 10:01:12 True Y 0
.mask(condition, other)
replaces values where condition is True with the values ofother
in this caseother
is the value from the previous row..assign(…)
is a way to update a column while returning a newDataFrame
this can be replaced with column assignmentdf['flag'] = df['flag'].where(…)
to modify theDataFrame
in place.