How to remove a row based on a condition in pandas?-CodePudding

I have a following dataframe:

Index	Description
0	Tab tab_1 of type yyy opened by User A
1	some_value
2	Tab tab_1 of type xxx opened by User B
3	Tab tab_4 of type yyy opened by User A
4	some_value
5	Tab tab_1 of type yyy closed by User A
6	some_value
7	Tab tab_1 of type xxx closed by User B
8	Tab tab_2 of type yyy closed by User A
9	some_value
10	Tab tab_3 of type zzz closed by User C

I would like to remove rows where cells in the "Description" column do not have a pair. By pairs I mean i.e. rows 0 and 5, and 2 and 7. Rows 3, 8 and 10 do not have their pairs - Certain tab IS opened by a certain user and IS NOT closed or IS closed but IS NOT opened.

Expected output:

Index	Description
0	Tab tab_1 of type yyy opened by User A
1	some_value
2	Tab tab_1 of type xxx opened by User B
4	some_value
5	Tab tab_1 of type yyy closed by User A
6	some_value
7	Tab tab_1 of type xxx closed by User B
9	some_value

Is there a way to do this?

CodePudding user response：

You can try this function duplicated: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.duplicated.html

For instance:

df_new = df.duplicated(subset=['Description'])

CodePudding user response：

df.drop_duplicates('Description')

CodePudding user response：

honestly i'm not sure is it what you need but anyway you can try this:

mask = (df.groupby(df['Description'].str.replace('opened|closed','',regex=True))['Description'].
        transform(lambda x: (x.str.contains('opened').any())&(x.str.contains('closed').any())))

res = df.loc[mask]

>>> res
'''
                                  
Index                             Description           
0      Tab tab_1 of type yyy opened by User A
2      Tab tab_1 of type xxx opened by User B
5      Tab tab_1 of type yyy closed by User A
7      Tab tab_1 of type xxx closed by User B

CodePudding user response：

replacing the text opened & closed with null then applying filtering (dataframegroupby method) to select where occurrence is one and then dropping it

data.drop(data.groupby(data['Description'].str.replace('opened|closed','',regex=True)).filter(lambda x: x['Description'].count() == 1).index)

Index   Description
    0   Tab tab_1 of type yyy opened by User A
    1   some_value
    2   Tab tab_1 of type xxx opened by User B
    4   some_value
    5   Tab tab_1 of type yyy closed by User A
    6   some_value
    7   Tab tab_1 of type xxx closed by User B
    9   some_value