I have columns values inside a list
df1 = pd.DataFrame(
{
"column0": [["xx, aa", "xx, aa"]],
"column1": [["yy, bb","yy, aa"]],
"column2": [["cc, xx", "cc, xx"]]})
column0 column1 column2
0 [xx, aa, xx, aa] [yy, bb, yy, aa] [cc, xx, cc, xx]
I want to separate all of them into multiple columns
outcome:
column0 column1 column2 column3 column4 column5 column6 column7 column8 column9
0 xx aa xx aa yy bb yy aa cc xx cc xx
Any ideas?
CodePudding user response:
I think it's a better approach to clean your data prior to instantiating a pandas DataFrame.
Here's an option you can take given the data structure you posted:
import pandas as pd
init_dict = {"column0": [["xx, aa", "xx, aa"]], "column1": [["yy, bb", "yy, aa"]], "column2": [["cc, xx", "cc, xx"]]}
pd_dict = {}
col_counter = 0
for _, value in init_dict.items():
for item in value.pop():
for inner_item in item.split(","):
pd_dict[f"column{col_counter}"] = [inner_item.strip()]
col_counter = 1
pd.DataFrame(pd_dict)
>>> column0 column1 column2 column3 column4 column5 column6 column7 column8 \
0 xx aa xx aa yy bb yy aa cc
column9 column10 column11
0 xx cc xx
CodePudding user response:
You can use np.char.split
method to split each cell on comma, and reshape it into the appropriate shape.
splits = np.char.split(np.array(df1.values.tolist()), sep =', ')
split_data = np.stack(splits.ravel()).reshape(df1.shape[0],-1)
df2 = pd.DataFrame(split_data, columns=['Col' str(i) for i in range(split_data.shape[1])])
print(df2)
Output:
Col0 Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9 Col10 Col11
0 xx aa xx aa yy bb yy aa cc xx cc xx