How to split comma separated dataset into different columns in python?-CodePudding

I have comma separated datasets with more than 50k rows like this:

items
a,e
b,b,d
c
d,e,a,c,b
c,a,c,e
b,d,a,c,e
a,d
d,d

I want to add them to different columns using pandas:

item1 item2 item3   item4   item5
a      e    NaN      NaN    NaN
b      b    d        NaN    NaN
c      NaN  NaN      NaN    NaN
d      e    a          c    b
c      a    c          e    NaN
b      d    a          c    e
a      d    NaN      NaN    NaN
d      d    NaN      NaN    NaN

CodePudding user response：

A simple solution is to split the column, convert it to a list and then create a new dataframe. Then rename the columns from the default values:

df = pd.DataFrame(df['items'].str.split(',').values.tolist())
df.columns  = 1
df = df.add_prefix('item_')

Result:

  item_0 item_1 item_2 item_3 item_4
0      a      e   None   None   None
1      b      b      d   None   None
2      c   None   None   None   None
3      d      e      a      c      b
4      c      a      c      e   None
5      b      d      a      c      e
6      a      d   None   None   None
7      d      d   None   None   None

CodePudding user response：

You could split your file by \n first like this

data = '''a,e
b,b,d
c
d,e,a,c,b
c,a,c,e
b,d,a,c,e
a,d
d,d'''
lines=data.split("\n")

then create a empty Dataframe

df = pd.DataFrame(columns=['item1','item2','item3','item4','item5'])

and add data line by line

for line in lines:
    line_data = line.split(",")
    line_data  = (5-len(line_data))*[None]
    df = pd.concat( [pd.DataFrame([line_data], columns=df.columns), df] , ignore_index=True)

output:

      item1 item2 item3 item4 item5
0      d     d  None  None  None
1      a     d  None  None  None
2      b     d     a     c     e
3      c     a     c     e  None
4      d     e     a     c     b
5      c  None  None  None  None
6      b     b     d  None  None
7      a     e  None  None  None
8      d     d  None  None  None
9      a     d  None  None  None
10     b     d     a     c     e
11     c     a     c     e  None
12     d     e     a     c     b
13     c  None  None  None  None
14     b     b     d  None  None
15     a     e  None  None  None
16     d     d  None  None  None
17     a     d  None  None  None
18     b     d     a     c     e
19     c     a     c     e  None
20     d     e     a     c     b
21     c  None  None  None  None
22     b     b     d  None  None
23     a     e  None  None  None