Home > Mobile >  Find a pattern in middle of multiple sentences
Find a pattern in middle of multiple sentences

Time:05-07

I have a dataframe as below,

data = [
    [ 1, 'AR-123456' ],
    [ 1, '123456' ],
    [ 2, '345678' ],
    [ 3,'Application-12345678901'],
    [ 3, '12345678901']
]
df = pd.DataFrame(data, columns=['Case', 'ID'] )
Case ID
1 AR-123456
1 123456
2 345678
3 Application-12345678901
3 12345678901

So basically I am trying to remove rows where for the same Case, the IDs are digits from AR- or Application- i.e., the final expected output :-

Case ID
1 AR-123456
2 345678
3 Application-12345678901

CodePudding user response:

Extract the digits and drop_duplicates:

df["digits"] = df["ID"].str.extract("(\d )")
output = df.drop_duplicates(["Case","digits"]).drop("digits",axis=1)

>>> output
   Case                       ID
0     1                AR-123456
2     2                   345678
3     3  Application-12345678901

CodePudding user response:

You can groupby Case column and then drop the duplicated ID rows that is in AR/Application item

out = (df.groupby('Case')
       .apply(lambda g: g[~g['ID'].isin(g['ID'].str.extract('(AR|Application)-(\d )')[1])])
       .reset_index(drop=True))
print(out)

   Case                       ID
0     1                AR-123456
1     2                   345678
2     3  Application-12345678901
  • Related