How to split the column of a dataframe-CodePudding

I would like to split the column of a dataframe as follows. Here is the main dataframe.

import pandas as pd

df_az = pd.DataFrame(list(zip(storage_AZ)),columns =['AZ Combination'])
df_az

Then, I applied this code to split the column.

out_az = (df_az.stack().apply(pd.Series).rename(columns=lambda x: f'a combination').unstack().swaplevel(0,1,axis=1).sort_index(axis=1))
out_az = pd.concat([out_az], axis=1)
out_az.head()

However, the result is as follows.

Meanwhile, the expected result is:

Could anyone help me what to change on the code, please? Thank you in advance.

CodePudding user response：

Here is an example of how you could use the str.split() method to split the "AZ Combination" column of your DataFrame:

df_az["AZ Combination"].str.split("-", expand=True)

This will split the "AZ Combination" column on the "-" character and create new columns for each part of the split. The expand=True argument tells the method to create new columns for the split values, instead of returning a Series.

You can also assign the new columns to a new DataFrame:

df_split = df_az["AZ Combination"].str.split("-", expand=True)

You can also assign new column names to the new columns like this:

df_split.columns = ['Part1', 'Part2']

You could also merge the new dataframe with the original one like this:

df_az = pd.concat([df_az,df_split], axis=1)

It is important to note that this will split the column by the "-", If the separator is different, please adjust the code accordingly. Also, if the number of parts of the string is variable, you may need to adjust the number of columns accordingly.

CodePudding user response：

You can apply np.ravel:

>>> pd.DataFrame.from_records(df_az['AZ Combination'].apply(np.ravel))

   0  1  2  3  4  5
0  0  0  0  0  0  0
1  0  0  0  0  0  1

CodePudding user response：

Convert column to list and reshape for 2d array, so possible use Dataframe contructor.

Then set columns names, for avoid duplicated columns names are add counter:

storage_AZ = [[[0,0,0],[0,0,0]],
              [[0,0,0],[0,0,1]],
              [[0,0,0],[0,1,0]],
              [[0,0,0],[1,0,0]],
              [[0,0,0],[1,0,1]]]
df_az = pd.DataFrame(list(zip(storage_AZ)),columns =['AZ Combination'])
    

N = 3
L = ['a combination','z combination']
df = pd.DataFrame(np.array(df_az['AZ Combination'].tolist()).reshape(df_az.shape[0],-1))
df.columns = [f'{L[a]}_{b}' for a, b in zip(df.columns // N, df.columns % N)]
print(df)
   a combination_0  a combination_1  a combination_2  z combination_0  \
0                0                0                0                0   
1                0                0                0                0   
2                0                0                0                0   
3                0                0                0                1   
4                0                0                0                1   

   z combination_1  z combination_2  
0                0                0  
1                0                1  
2                1                0  
3                0                0  
4                0                1

If need MultiIndex:

df = pd.concat({'AZ Combination':df}, axis=1)
print(df)
   AZ Combination                                                  \
  a combination_0 a combination_1 a combination_2 z combination_0   
0               0               0               0               0   
1               0               0               0               0   
2               0               0               0               0   
3               0               0               0               1   
4               0               0               0               1   

                                   
  z combination_1 z combination_2  
0               0               0  
1               0               1  
2               1               0  
3               0               0  
4               0               1