I have a following dataframe:
Index | Description |
---|---|
0 | Tab tab_1 of type yyy opened by User A |
1 | some_value |
2 | Tab tab_1 of type xxx opened by User B |
3 | Tab tab_4 of type yyy opened by User A |
4 | some_value |
5 | Tab tab_1 of type yyy closed by User A |
6 | some_value |
7 | Tab tab_1 of type xxx closed by User B |
8 | Tab tab_2 of type yyy closed by User A |
9 | some_value |
10 | Tab tab_3 of type zzz closed by User C |
I would like to remove rows where cells in the "Description" column do not have a pair. By pairs I mean i.e. rows 0 and 5, and 2 and 7. Rows 3, 8 and 10 do not have their pairs - Certain tab IS opened by a certain user and IS NOT closed or IS closed but IS NOT opened.
Expected output:
Index | Description |
---|---|
0 | Tab tab_1 of type yyy opened by User A |
1 | some_value |
2 | Tab tab_1 of type xxx opened by User B |
4 | some_value |
5 | Tab tab_1 of type yyy closed by User A |
6 | some_value |
7 | Tab tab_1 of type xxx closed by User B |
9 | some_value |
Is there a way to do this?
CodePudding user response:
You can try this function duplicated
: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.duplicated.html
For instance:
df_new = df.duplicated(subset=['Description'])
CodePudding user response:
df.drop_duplicates('Description')
CodePudding user response:
honestly i'm not sure is it what you need but anyway you can try this:
mask = (df.groupby(df['Description'].str.replace('opened|closed','',regex=True))['Description'].
transform(lambda x: (x.str.contains('opened').any())&(x.str.contains('closed').any())))
res = df.loc[mask]
>>> res
'''
Index Description
0 Tab tab_1 of type yyy opened by User A
2 Tab tab_1 of type xxx opened by User B
5 Tab tab_1 of type yyy closed by User A
7 Tab tab_1 of type xxx closed by User B
CodePudding user response:
replacing the text opened & closed with null then applying filtering (dataframegroupby method) to select where occurrence is one and then dropping it
data.drop(data.groupby(data['Description'].str.replace('opened|closed','',regex=True)).filter(lambda x: x['Description'].count() == 1).index)
Index Description
0 Tab tab_1 of type yyy opened by User A
1 some_value
2 Tab tab_1 of type xxx opened by User B
4 some_value
5 Tab tab_1 of type yyy closed by User A
6 some_value
7 Tab tab_1 of type xxx closed by User B
9 some_value