I have a dataframe:
df = pd.DataFrame(np.array([['bob, sam, manny'], ['bob (a description, of some sort), marry, rob']]), columns=['target'])
target
0 bob, sam, manny
1 bob (a description, of some sort), marry, rob
I want to convert column target
to multiple columns using the comma as the separator.
I want it to look like this:
target a b c
0 bob, sam, manny bob sam manny
1 bob (a description, of some sort), marry, rob bob (a description, of some sort) marry rob
So far, I was able to do this: df[["a", "b", "c", "d"]] = df["target"].str.split(pat=",", expand=True)
target a b c d
0 bob, sam, manny bob sam manny None
1 bob (a description, of some sort), marry, rob bob (a description of some sort) marry rob
But this recognizes the comma within the () as a separator. How do I ignore commas within ()'s?
CodePudding user response:
You can use regex to split on commas except when between parentheses:
df = pd.DataFrame(np.array([['bob, sam, manny'], ['bob (a description, of some sort), marry, rob']]), columns=['target'])
df[["a", "b", "c"]] = df["target"].str.split(r'\,\s*(?![^()]*\))', expand=True)
Output:
target | a | b | c | |
---|---|---|---|---|
0 | bob, sam, manny | bob | sam | manny |
1 | bob (a description, of some sort), marry, rob | bob (a description, of some sort) | marry | rob |