Home > Enterprise >  Pandas DataFrame: How to drop continuous rows gracefully
Pandas DataFrame: How to drop continuous rows gracefully

Time:12-02

I have a data frame, having two type of rows: SWITCH and RESULT My expectation is to drop the adjacent "SWITCH" and keep the last SWITCH in the block only, but keep all the RESULT rows.

I did it using data frame iterrows and I basically scanned line by line. This is not pythonic. Can you please advise if you are seeing a better way? Below is the sample data, and the code I'm using:

import pandas as pd

data = {'TYPE':['SWITCH','SWITCH','SWITCH',
'SWITCH','RESULT','RESULT','RESULT',
'RESULT','RESULT','SWITCH','SWITCH',
'RESULT','RESULT','RESULT','RESULT'],
'RESULT':['YES',
'NO','NO','YES',
'DONE','DONE','DONE',
'DONE','DONE','NO',
'YES','DONE','DONE',
'DONE','DONE']}

df = pd.DataFrame(data)
l = []
start=-1
for index, row in df.iterrows():
    type = row["TYPE"]
    if type == "RESULT":
        if start == -1:
            start = index 
    elif type == "SWITCH":
        if start== -1:
            df.drop(index=[*range(index, index 1, 1)], inplace=True)
            continue
                
        end = index-1
        if start <= end:
            df.drop(index=[*range(start,end,1)], inplace=True)
            start = index   1

print(df)

Just checked the output and found my code didn't do what I'm looking for:

Before applying the code

As index 0~index 3 are all "SWITCH", I want to drop the index 0/1/2 and keep the index 3 only, as this is a "block of switch" Similarily, for index 9/10 i want to keep index 10 only

      TYPE RESULT
0   SWITCH    YES
1   SWITCH     NO
2   SWITCH     NO
3   SWITCH    YES
4   RESULT   DONE
5   RESULT   DONE
6   RESULT   DONE
7   RESULT   DONE
8   RESULT   DONE
9   SWITCH     NO
10  SWITCH    YES
11  RESULT   DONE
12  RESULT   DONE
13  RESULT   DONE
14  RESULT   DONE

Expected output:

      TYPE RESULT
3   SWITCH    YES
4   RESULT   DONE
5   RESULT   DONE
6   RESULT   DONE
7   RESULT   DONE
8   RESULT   DONE
10  SWITCH    YES
11  RESULT   DONE
12  RESULT   DONE
13  RESULT   DONE
14  RESULT   DONE

Actual output:

      TYPE RESULT
8   RESULT   DONE
9   SWITCH     NO
10  SWITCH    YES
11  RESULT   DONE
12  RESULT   DONE
13  RESULT   DONE
14  RESULT   DONE

CodePudding user response:

If I understand you correctly, for each group of consecutive rows with TYPE == "SWITCH" you want to keep the last row. This can be done as follows:

df_processed = df[(df.TYPE != "SWITCH") | (df.TYPE.shift(-1) != "SWITCH")]

The output for the provided example data is

enter image description here

CodePudding user response:

Iterating the rows of a dataframe is considered bad practice and should be avoided.

I believe you are looking for something along these lines:

# Get the rows where TYPE == RESULT
df_type_result = df[df['TYPE'] == 'RESULT']

# Get the last index when the result type == SWITCH 
idxs = df.reset_index().groupby(['TYPE', 'RESULT']).last().loc['SWITCH']['index']
df_type_switch = df.loc[idxs]

# Concatenate and sort the results
df_result = pd.concat([df_type_result, df_type_switch]).sort_index()
df_result

enter image description here

CodePudding user response:

A lazy solution

df["DROP"] = df["TYPE"].shift(-1)
df = df.loc[~((df["TYPE"]=="SWITCH")&(df["DROP"]=="SWITCH"))]
df.drop(columns="DROP", inplace=True)

  • Related