Home > database >  Split column in dataframe into n number of columns
Split column in dataframe into n number of columns

Time:03-16

In a column in a Pandas Dataframe, I have strings like this:

column_name_1 column_name_2
a^b^c^d j
e^f^g k^l
h^i m

I need to split these strings into columns in the same data frame, like this

column_name_1 column_name_2 column_name_1_1 column_name_1_2 column_name_1_3 column_name_2_1 column_name_2_2
a^b^c j a b c j
e^f^g k^l e f g k l
h^i m h i m

I cannot figure out how to do this without knowing in advance how many occurrences of the delimiter there is in the data.

My best effort either includes something like

df[["column_name_1_1","column_name_1_2 ","column_name_1_3"]] = df["column_name_1"].str.split('^',n=2, expand=True)

But it failes with a

ValueError: The columns in the computed data do not match the columns in the provided metadata

CodePudding user response:

Let's try it with stack str.split unstack join.

The idea is to split each column by ^ and expand the split characters into a separate column. stack helps us do a single str.split on a Series object and unstack creates a DataFrame with the same index as the original.

tmp = df.stack().str.split('^', expand=True).unstack(level=1).sort_index(level=1, axis=1)
tmp.columns = [f'{y}_{x 1}' for x, y in tmp.columns]
out = df.join(tmp).dropna(how='all', axis=1).fillna('')

Output:

  column_name_1 column_name_2 column_name_1_1 column_name_1_2 column_name_1_3 column_name_1_4 column_name_2_1 column_name_2_2  
0       a^b^c^d             j               a               b               c               d               j                  
1         e^f^g           k^l               e               f               g                               k               l  
2           h^i             m               h               i                                               m                  

CodePudding user response:

One-liner:

new_df = pd.concat([df]   [pd.DataFrame([pd.Series(s) for s in df[col].str.split('^')]).add_prefix(c.name   '_') for col in df], axis=1).fillna('')

Output:

>>> new_df
  column_name_1 column_name_2 column_name_1_0 column_name_1_1 column_name_1_2 column_name_1_3 column_name_1_0 column_name_1_1
0       a^b^c^d             j               a               b               c               d               j
1         e^f^g           k^l               e               f               g                               k               l
2           h^i             m               h               i                                               m
  • Related