I have the following data. There are consecutive runs of True
per day, however sometimes there's the odd False
for a day followed by True
.
Due to the nature of this data, it appears this is an error, therefore, I need to replace these odd False
s with True, if the 'sandwiched' False only lasts 1 day.
For False
s that are not between True
s, I want them to remain the same.
pd.DataFrame({
'start_datetime' : [
'2019-04-29 00:00:00','2019-04-30 00:00:00','2019-05-01 00:00:00','2019-05-02 00:00:00','2019-05-03 00:00:00','2019-05-07 00:00:00','2019-05-08 00:00:00','2019-05-09 00:00:00','2019-05-10 00:00:00','2019-05-14 00:00:00'],
'diag_local_code' : [
True,
True,
True,
True,
False,
True,
True,
False,
False,
False
],
})
>>>
start_datetime diag_local_code
0 2019-04-29 00:00:00 True
1 2019-04-30 00:00:00 True
2 2019-05-01 00:00:00 True
3 2019-05-02 00:00:00 True
4 2019-05-03 00:00:00 False
5 2019-05-07 00:00:00 True
6 2019-05-08 00:00:00 True
7 2019-05-09 00:00:00 False
8 2019-05-10 00:00:00 False
9 2019-05-14 00:00:00 False
CodePudding user response:
Let's say ths dataframe is df:
for i in range (0,10):
if df['diag_local_code'].iloc[i] == False and df['diag_local_code'].iloc[i-1]==True and df['diag_local_code'].iloc[i 1]==True:
df['diag_local_code'].iloc[i]=True
This worked for me!
CodePudding user response:
# Setup
df = pd.DataFrame({
'start_datetime' : [
'2019-04-29 00:00:00','2019-04-30 00:00:00','2019-05-01 00:00:00','2019-05-02 00:00:00','2019-05-03 00:00:00','2019-05-07 00:00:00','2019-05-08 00:00:00','2019-05-09 00:00:00','2019-05-10 00:00:00','2019-05-14 00:00:00'],
'diag_local_code' : [
True,
True,
True,
True,
False,
True,
True,
False,
False,
False
],
})
# Shift to get row above and below
df['next'] = df['diag_local_code'].shift(-1)
df['previous'] = df['diag_local_code'].shift(1)
# Filter
condition = ((df['next'] == True)) & (df['previous'] == True) & (df['diag_local_code'] == False)
# Assign value
df[condition] = True