Home > Mobile >  How to skip some symbol characters, when this character is used as a split column symbol in pandas s
How to skip some symbol characters, when this character is used as a split column symbol in pandas s

Time:12-17

I have a dataframe like below: Original data

index   string
0        a,b,c,d,e,f
1        a,b,c,d,e,f
2        a,(I,j,k),c,d,e,f

I want to be: To be data

index   col1    col2    col3    col4    col5    col6
0        a       b       c       d       e        f
1        a       b       c       d       e        f
2        a     (I,j,k)   c       d       e        f

CodePudding user response:

You can split on commas that are not inside brackets. Then convert the result to a DataFrame and assign to df columns:

df[['col {}'.format(i) for i in range(1,7)]] =  df['string'].str.split(r",\s*(?![^()]*\))").apply(pd.Series)

Output:

   index             string col 1    col 2 col 3 col 4 col 5 col 6
0      0        a,b,c,d,e,f     a        b     c     d     e     f
1      1        a,b,c,d,e,f     a        b     c     d     e     f
2      2  a,(I,j,k),c,d,e,f     a  (I,j,k)     c     d     e     f

CodePudding user response:

Try this :

df = df['string'].str.split(r",\s*(?![^()]*\))", expand= True)
df.columns = ['col1','col2','col3','col4','col5','col6']
  • Related