I am trying to remove text before specific brackets using REGEX in comma separated column using Pandas
From this -
colA
My Company Ltd [CS], address, nbc [LV], state [NP], pc [SS], country
Business Plc [CS], address, abc [LV], state [NP], code [SS], country
Work Harder Inc [CS], address, xyz[CS], state [NP], code [SS], country
Company Business People [CS], address, typode [SS], country, nlp [CS]
Text before [CS] and [LV] and within brackets has to be removed
Expected result -
colA
address, state [NP], pc [SS], country
address, state [NP], code [SS], country
address, state [NP], code [SS], country
address, typode [SS], country
CodePudding user response:
You can also use regex [^,]*\[(CS|LV)\],?
to match and remove the patterns:
df.colA.str.replace('[^,]*\[(CS|LV)\],?', '').str.strip(', ')
0 address, state [NP], pc [SS], country
1 address, state [NP], code [SS], country
2 address, state [NP], code [SS], country
3 address, typode [SS], country
Name: colA, dtype: object
where [^,]*
matches patterns between commas, \[(CS|LV)\]
to match [CS]
or [LV]
and ,?
for optional following comma.
CodePudding user response:
Try:
df.colA = df.colA.apply(
lambda x: ", ".join(
w for w in x.split(", ") if "[CS]" not in w and "[LV]" not in w
)
)
print(df)
Prints:
colA
0 address, state [NP], pc [SS], country
1 address, state [NP], code [SS], country
2 address, state [NP], code [SS], country
3 address, typode [SS], country