Split column in a Pandas Dataframe into n number of columns-CodePudding

In a column in a Pandas Dataframe, I have strings like this:

column_name_1	column_name_2
a^b^c	j
e^f^g	k^l
h^i	m

I need to split these strings into columns in the same data frame, like this

column_name_1	column_name_2	column_name_1_1	column_name_1_2	column_name_1_3	column_name_2_1	column_name_2_2
a^b^c	j	a	b	c	j
e^f^g	k^l	e	f	g	k	l
h^i	m	h	i		m

I cannot figure out how to do this without knowing in advance how many occurrences of the delimiter there is in the data.

My best effort either includes something like

df[["column_name_1_1","column_name_1_2 ","column_name_1_3"]] = df["column_name_1"].str.split('^',n=2, expand=True)

But it failes with a

ValueError: The columns in the computed data do not match the columns in the provided metadata

CodePudding user response：

Let's try it with stack str.split unstack join.

The idea is to split each column by ^ and expand the split characters into a separate column. stack helps us do a single str.split on a Series object and unstack creates a DataFrame with the same index as the original.

tmp = df.stack().str.split('^', expand=True).unstack(level=1).sort_index(level=1, axis=1)
tmp.columns = [f'{y}_{x 1}' for x, y in tmp.columns]
out = df.join(tmp).dropna(how='all', axis=1).fillna('')

Output:

  column_name_1 column_name_2 column_name_1_1 column_name_1_2 column_name_1_3 column_name_1_4 column_name_2_1 column_name_2_2  
0       a^b^c^d             j               a               b               c               d               j                  
1         e^f^g           k^l               e               f               g                               k               l  
2           h^i             m               h               i                                               m

CodePudding user response：

One-liner:

new_df = pd.concat([df]   [pd.DataFrame([pd.Series(s) for s in df[col].str.split('^')]).add_prefix(c.name   '_') for col in df], axis=1).fillna('')

Output:

>>> new_df
  column_name_1 column_name_2 column_name_1_0 column_name_1_1 column_name_1_2 column_name_1_3 column_name_1_0 column_name_1_1
0       a^b^c^d             j               a               b               c               d               j
1         e^f^g           k^l               e               f               g                               k               l
2           h^i             m               h               i                                               m