I have a dataframe like this.
data |
---|
'(p) apple (/p) (p) boy (/p) (p) cat (/p)' |
------------------------------------- |
'(p) apple (p) (p) boy (/p)' |
and I want something like this:
data |
---|
'(p) 1.apple (/p) (p) 2.boy (/p) (p) 3.cat (/p)' |
------------------------------------------- |
'(p) 1.apple (p) (p) 2.boy (/p)' |
I want to add numbering for every '(p)' tag for every row. A row can contain any number of
tags, so I want to add according to that. Please help me to solve this.
CodePudding user response:
I would match a regular expression pattern and then rebuild the (p)...(/p)
element:
import re
import pandas as pd
# Test DataFrame
df = pd.DataFrame({"data":["(p)apple(/p)(p)boy(/p)(p)cat(/p)","(p)apple(/p)(p)boy(/p)"]})
pattern = re.compile("\(p\)(.*?)\(/p\)")
df["data"].apply(lambda x: [f'(p){i 1} {s}(/p)' for i, s in enumerate(pattern.findall(x))])