Home > Back-end >  Remove text before specific text within brackets using REGEX in Pandas
Remove text before specific text within brackets using REGEX in Pandas

Time:09-12

I am trying to remove text before specific brackets using REGEX in comma separated column using Pandas

From this -

colA
My Company Ltd [CS], address, nbc [LV], state [NP], pc [SS], country
Business Plc [CS], address, abc [LV], state [NP], code [SS], country
Work Harder Inc [CS], address, xyz[CS], state [NP], code [SS], country
Company Business People [CS], address, typode [SS], country, nlp [CS]

Text before [CS] and [LV] and within brackets has to be removed

Expected result -

colA
address, state [NP], pc [SS], country
address, state [NP], code [SS], country
address, state [NP], code [SS], country
address, typode [SS], country

CodePudding user response:

You can also use regex [^,]*\[(CS|LV)\],? to match and remove the patterns:

df.colA.str.replace('[^,]*\[(CS|LV)\],?', '').str.strip(', ')

0      address, state [NP], pc [SS], country
1    address, state [NP], code [SS], country
2    address, state [NP], code [SS], country
3              address, typode [SS], country
Name: colA, dtype: object

where [^,]* matches patterns between commas, \[(CS|LV)\] to match [CS] or [LV] and ,? for optional following comma.

CodePudding user response:

Try:

df.colA = df.colA.apply(
    lambda x: ", ".join(
        w for w in x.split(", ") if "[CS]" not in w and "[LV]" not in w
    )
)
print(df)

Prints:

                                      colA
0    address, state [NP], pc [SS], country
1  address, state [NP], code [SS], country
2  address, state [NP], code [SS], country
3            address, typode [SS], country
  • Related