Home > Software design >  Split columns of lists into multiple columns
Split columns of lists into multiple columns

Time:12-03

I have columns values inside a list

df1 = pd.DataFrame(
    {
        "column0": [["xx, aa", "xx, aa"]],
        "column1": [["yy, bb","yy, aa"]],
        "column2": [["cc, xx", "cc, xx"]]})

         column0         column1               column2
0   [xx, aa, xx, aa]    [yy, bb, yy, aa]    [cc, xx, cc, xx]

I want to separate all of them into multiple columns

outcome:

   column0 column1 column2 column3 column4 column5  column6  column7 column8 column9  
0   xx    aa   xx     aa    yy    bb    yy     aa      cc       xx     cc     xx

Any ideas?

CodePudding user response:

I think it's a better approach to clean your data prior to instantiating a pandas DataFrame.

Here's an option you can take given the data structure you posted:

import pandas as pd

init_dict = {"column0": [["xx, aa", "xx, aa"]], "column1": [["yy, bb", "yy, aa"]], "column2": [["cc, xx", "cc, xx"]]}

pd_dict = {}
col_counter = 0
for _, value in init_dict.items():
    for item in value.pop():
        for inner_item in item.split(","):
            pd_dict[f"column{col_counter}"] = [inner_item.strip()]
            col_counter  = 1

pd.DataFrame(pd_dict)
>>>  column0 column1 column2 column3 column4 column5 column6 column7 column8  \
0      xx      aa      xx      aa      yy      bb      yy      aa      cc   

  column9 column10 column11  
0      xx       cc       xx  

CodePudding user response:

You can use np.char.split method to split each cell on comma, and reshape it into the appropriate shape.

splits = np.char.split(np.array(df1.values.tolist()), sep =', ')
split_data = np.stack(splits.ravel()).reshape(df1.shape[0],-1)

df2 = pd.DataFrame(split_data, columns=['Col' str(i) for i in range(split_data.shape[1])])
print(df2)

Output:

  Col0 Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9 Col10 Col11
0   xx   aa   xx   aa   yy   bb   yy   aa   cc   xx    cc    xx
  • Related