I have a data frame, having two type of rows: SWITCH and RESULT My expectation is to drop the adjacent "SWITCH" and keep the last SWITCH in the block only, but keep all the RESULT rows.
I did it using data frame iterrows and I basically scanned line by line. This is not pythonic. Can you please advise if you are seeing a better way? Below is the sample data, and the code I'm using:
import pandas as pd
data = {'TYPE':['SWITCH','SWITCH','SWITCH',
'SWITCH','RESULT','RESULT','RESULT',
'RESULT','RESULT','SWITCH','SWITCH',
'RESULT','RESULT','RESULT','RESULT'],
'RESULT':['YES',
'NO','NO','YES',
'DONE','DONE','DONE',
'DONE','DONE','NO',
'YES','DONE','DONE',
'DONE','DONE']}
df = pd.DataFrame(data)
l = []
start=-1
for index, row in df.iterrows():
type = row["TYPE"]
if type == "RESULT":
if start == -1:
start = index
elif type == "SWITCH":
if start== -1:
df.drop(index=[*range(index, index 1, 1)], inplace=True)
continue
end = index-1
if start <= end:
df.drop(index=[*range(start,end,1)], inplace=True)
start = index 1
print(df)
Just checked the output and found my code didn't do what I'm looking for:
Before applying the code
As index 0~index 3 are all "SWITCH", I want to drop the index 0/1/2 and keep the index 3 only, as this is a "block of switch" Similarily, for index 9/10 i want to keep index 10 only
TYPE RESULT
0 SWITCH YES
1 SWITCH NO
2 SWITCH NO
3 SWITCH YES
4 RESULT DONE
5 RESULT DONE
6 RESULT DONE
7 RESULT DONE
8 RESULT DONE
9 SWITCH NO
10 SWITCH YES
11 RESULT DONE
12 RESULT DONE
13 RESULT DONE
14 RESULT DONE
Expected output:
TYPE RESULT
3 SWITCH YES
4 RESULT DONE
5 RESULT DONE
6 RESULT DONE
7 RESULT DONE
8 RESULT DONE
10 SWITCH YES
11 RESULT DONE
12 RESULT DONE
13 RESULT DONE
14 RESULT DONE
Actual output:
TYPE RESULT
8 RESULT DONE
9 SWITCH NO
10 SWITCH YES
11 RESULT DONE
12 RESULT DONE
13 RESULT DONE
14 RESULT DONE
CodePudding user response:
If I understand you correctly, for each group of consecutive rows with TYPE == "SWITCH"
you want to keep the last row. This can be done as follows:
df_processed = df[(df.TYPE != "SWITCH") | (df.TYPE.shift(-1) != "SWITCH")]
The output for the provided example data is
CodePudding user response:
Iterating the rows of a dataframe is considered bad practice and should be avoided.
I believe you are looking for something along these lines:
# Get the rows where TYPE == RESULT
df_type_result = df[df['TYPE'] == 'RESULT']
# Get the last index when the result type == SWITCH
idxs = df.reset_index().groupby(['TYPE', 'RESULT']).last().loc['SWITCH']['index']
df_type_switch = df.loc[idxs]
# Concatenate and sort the results
df_result = pd.concat([df_type_result, df_type_switch]).sort_index()
df_result
CodePudding user response:
A lazy solution
df["DROP"] = df["TYPE"].shift(-1)
df = df.loc[~((df["TYPE"]=="SWITCH")&(df["DROP"]=="SWITCH"))]
df.drop(columns="DROP", inplace=True)