Conditional changes in pandas to delete text in brackets-CodePudding

I have csv dictionary with two text columns. They both contain some text in brackets and I want to delete all the text in the brackets including the brackets themselves. How can I do this with pandas (and maybe regex)?

Currently:

                                  Deutsch                              Englisch
0                            spindeldürr                                spindly
1                   Garn {n} [auch fig.]                                   yarn
2                             Schnur {f}                                  twine
3                               Naht {f}                                 suture
4                                zunähen                              to suture
5                              Faden {m}                        strand [thread]

Goal:

                                  Deutsch                              Englisch
0                            spindeldürr                                spindly
1                                   Garn                                   yarn
2                                 Schnur                                  twine
3                                   Naht                                 suture
4                                zunähen                              to suture
5                                  Faden                                 strand

CodePudding user response：

You may try doing a replacement on the pattern \s*[\[{].*?[\]}]\s*:

df["Deutsch"] = df["Deutsch"].str.replace(r'\s*[\[{].*?[\]}]\s*', '')

And use the same replacement for the other column as well. Here is a running regex demo showing that the replacement logic is working.

CodePudding user response：

You could use:

repl = lambda c: c.str.replace(r'\s*(\{[^{}] \}|\[[^\[\]] \])\s*', '', regex=True)

df.apply(repl)

output:

       Deutsch   Englisch
0  spindeldürr    spindly
1         Garn       yarn
2       Schnur      twine
3         Naht     suture
4      zunähen  to suture
5        Faden     strand

regex demo