Append the lists in a DataFrame Column: Must have equal len keys and value when setting with an iter-CodePudding

I have a DataFrame which has a column of lists and i'm filling this lists with new values.

df = pd.DataFrame({'col1':[1,2,3],'col2':[4,5,6], 'col3':[[],['x','y','z'],['x1','y1','z1']]}, index=['a','b','c'])

print(df)  
           col1  col2          col3
        a     1     4            []
        b     2     5     [x, y, z]
        c     3     6  [x1, y1, z1]

Here, the 'col3' column has lists, and what I'm trying to do is replace the current value with itself the new list and using set() just to drop duplicates if exists. The new list is not always the same, I just did this way to make it easy. Observe that the order of the loop is not the same of the indexes, even though I need to put the right value in the right place.

n=0
for index in ['b','a','c']:
    n =1
    list_to_append = ['x' n,'y' n,'z' n]
    new_list = list(set(df.loc[index,'col3'] list_to_append))

    df.loc[index,'col3'] = new_list

This is what I expect to get:

print(df)  
           col1  col2                          col3
        a     1     4              ['x2','y2','z2']
        b     2     5      [x, y, z,'x1','y1','z1']
        c     3     6   [x1, y1, z1,'x3','y3','z3']

out:

ValueError: Must have equal len keys and value when setting with an iterable

is there any correct way to do this?

CodePudding user response：

Does this help. apply might not be the best approach depending on the size of df


import pandas as pd

df = pd.DataFrame({'col1':[1,2,3],'col2':[4,5,6], 'col3':[[],['x','y','z'],['x1','y1','z1']]}, index=['a','b','c'])

df['col3'] = df['col3'].apply(lambda x: list(set( x   ['x3','y3','z3'])))

print(df)

   col1  col2                      col3
a     1     4              [y3, z3, x3]
b     2     5     [x, y3, z3, z, x3, y]
c     3     6  [z1, y3, z3, y1, x1, x3]

CodePudding user response：

You could use apply with help of a set difference:

S = set(list_to_append)
df['col3'] = df['col3'].apply(lambda x: x list(S.difference(x)))

output (with a new column for clarity):

   col1  col2          col3                      col4
a     1     4            []              [z3, y3, x3]
b     2     5     [x, y, z]     [x, y, z, z3, y3, x3]
c     3     6  [x1, y1, z1]  [x1, y1, z1, z3, y3, x3]

For a variable input use a Series:

s = pd.Series([list((f'x{n}',f'y{n}',f'z{n}')) for n in range(len(df))],
              index=['b','a','c'])


df['col4'] = (df['col3'] s).apply(lambda x: list(dict.fromkeys(x)))

output:

   col1  col2          col3                      col4
a     1     4            []              [x1, y1, z1]
b     2     5     [x, y, z]     [x, y, z, x0, y0, z0]
c     3     6  [x1, y1, z1]  [x1, y1, z1, x2, y2, z2]