Home > Mobile >  Splitting columns containing comma separated string to new row values
Splitting columns containing comma separated string to new row values

Time:06-10

I have a data frame of the below format

       variable         val
0   'a','x','y'          10

I would like to unnlist(explode) the data in the below format.

     variable1     variable2      value
  0     a               x           10
  1     a               y           10
  2     x               y           10   

I have tried using df.explode which does not give me the relation between x and y. My code is as below. Can anyone guide me as to how can I proceed further to get the x and y data. Thanks in advance.

import pandas as pd
from ast import literal_eval

data = {'name':["'a','x','y'"], 'val' : [10]}
df = pd.DataFrame(data)

df2 = (df['name'].str.split(',',expand = True, n = 1)
       .rename(columns = {0 : 'variable 1', 1 : 'variable 2'})
       .join(df.drop(columns = 'name')))

df2['variable 2']=df2['variable 2'].map(literal_eval)
df2=df2.explode('variable 2',ignore_index=True)

print(df2)
OUTPUT:

    variable 1   variable 2     val
0        'a'           x        10
1        'a'           y        10

CodePudding user response:

If need each combinations per splitted values by , use:

print (df)
          variable  val
0      'a','x','y'   10
1  'a','x','y','f'   80
2              's'    4

from  itertools import combinations

df['variable'] = df['variable'].str.replace("'", "", regex=True)

s = [x.split(',') if ',' in x else (x,x) for x in df['variable']]
L = [(*y, z) for x, z in zip(s, df['val']) for y in combinations(x, 2)]
df = pd.DataFrame(L, columns=['variable 1','variable 2','val'])

print (df)
  variable 1 variable 2  val
0          a          x   10
1          a          y   10
2          x          y   10
3          a          x   80
4          a          y   80
5          a          f   80
6          x          y   80
7          x          f   80
8          y          f   80
9          s          s    4
  • Related