Home > Software design >  Pandas explode/split lists and reasign values
Pandas explode/split lists and reasign values

Time:10-06

I have this code

df = pd.DataFrame({'an':{0: 'f', 1: 'i', 2:'-' , 3:'-' , 4:'f' , 5:'c,f,i,j' , 6:'c,d,e,j'},
                   'nv':{0: [-1], 1: [-1], 2: ['-'], 3:['-'] , 4:[-2] , 5:[-2,-1,-3,-1] , 6:[-2,-1,-2,-1]},
                   })

which yields

    an         nv
0   f         [-1]
1   i         [-1]
2   -         [-]
3   -         [-]
4   f         [-2]
5   c,f,i,j   [-2, -1, -3, -1]
6   c,d,e,j   [-2, -1, -2, -1]

I would like to split column ['an'], so that each value inside ['an'] becomes column becomes a column with the value currently assigned on the bracket from ['nv'] column.

Desired output:

    an       nv                  c   d   e  f   i   j
0   f       [-1]                           -1       
1   i       [-1]                               -1   
2   -       [-]                     
3   -       [-]                     
4   f       [-2]                           -2       
5   c,f,i,j [-2, -1, -3, -1]    -2         -1  -3  -1
6   c,d,e,j [-2, -1, -2, -1]    -2  -1  -2         -1

Solution on this question is similar, but it does not work for my problem

CodePudding user response:

Try:

df = pd.concat(
    [
        df,
        (
            x := pd.DataFrame(
                [
                    {k: v for k, v in zip(a, b) if v != "-"}
                    for a, b in zip(df.an.str.split(","), df.nv)
                ]
            ).fillna("")
        )[sorted(x)],
    ],
    axis=1,
)

print(df)

Prints:

        an                nv    c    d    e    f    i    j
0        f              [-1]                -1.0          
1        i              [-1]                     -1.0     
2        -               [-]                              
3        -               [-]                              
4        f              [-2]                -2.0          
5  c,f,i,j  [-2, -1, -3, -1] -2.0           -1.0 -3.0 -1.0
6  c,d,e,j  [-2, -1, -2, -1] -2.0 -1.0 -2.0           -1.0

CodePudding user response:

Let's split the an column then explode the dataframe on multiple columns, then reshape with pivot:

df.join(
    df
    .query("an != '-'")
    .assign(an=df['an'].str.split(','))
    .explode(['an', 'nv'])
    .pivot(columns='an', values='nv')
).fillna('')

        an                nv   c   d   e   f   i   j
0        f              [-1]              -1        
1        i              [-1]                  -1    
2        -               [-]                        
3        -               [-]                        
4        f              [-2]              -2        
5  c,f,i,j  [-2, -1, -3, -1]  -2          -1  -3  -1
6  c,d,e,j  [-2, -1, -2, -1]  -2  -1  -2          -1

CodePudding user response:

Using crosstab as a way to reshape:

s = df['an'].str.split(',').explode()
df.join(pd.crosstab(s.index, s, values=df['nv'].explode(), aggfunc='first')
          .fillna('').drop(columns='-'))

output:

        an                nv   c   d   e   f   i   j
0        f              [-1]              -1        
1        i              [-1]                  -1    
2        -               [-]                        
3        -               [-]                        
4        f              [-2]              -2        
5  c,f,i,j  [-2, -1, -3, -1]  -2          -1  -3  -1
6  c,d,e,j  [-2, -1, -2, -1]  -2  -1  -2          -1
  • Related