I have csv dictionary with two text columns. They both contain some text in brackets and I want to delete all the text in the brackets including the brackets themselves. How can I do this with pandas (and maybe regex)?
Currently:
Deutsch Englisch
0 spindeldürr spindly
1 Garn {n} [auch fig.] yarn
2 Schnur {f} twine
3 Naht {f} suture
4 zunähen to suture
5 Faden {m} strand [thread]
Goal:
Deutsch Englisch
0 spindeldürr spindly
1 Garn yarn
2 Schnur twine
3 Naht suture
4 zunähen to suture
5 Faden strand
CodePudding user response:
You may try doing a replacement on the pattern \s*[\[{].*?[\]}]\s*
:
df["Deutsch"] = df["Deutsch"].str.replace(r'\s*[\[{].*?[\]}]\s*', '')
And use the same replacement for the other column as well. Here is a running regex demo showing that the replacement logic is working.
CodePudding user response:
You could use:
repl = lambda c: c.str.replace(r'\s*(\{[^{}] \}|\[[^\[\]] \])\s*', '', regex=True)
df.apply(repl)
output:
Deutsch Englisch
0 spindeldürr spindly
1 Garn yarn
2 Schnur twine
3 Naht suture
4 zunähen to suture
5 Faden strand