I have a DataFrame which has a column of lists and i'm filling this lists with new values.
df = pd.DataFrame({'col1':[1,2,3],'col2':[4,5,6], 'col3':[[],['x','y','z'],['x1','y1','z1']]}, index=['a','b','c'])
print(df)
col1 col2 col3
a 1 4 []
b 2 5 [x, y, z]
c 3 6 [x1, y1, z1]
Here, the 'col3' column has lists, and what I'm trying to do is replace the current value with itself the new list and using set()
just to drop duplicates if exists. The new list is not always the same, I just did this way to make it easy. Observe that the order of the loop is not the same of the indexes, even though I need to put the right value in the right place.
n=0
for index in ['b','a','c']:
n =1
list_to_append = ['x' n,'y' n,'z' n]
new_list = list(set(df.loc[index,'col3'] list_to_append))
df.loc[index,'col3'] = new_list
This is what I expect to get:
print(df)
col1 col2 col3
a 1 4 ['x2','y2','z2']
b 2 5 [x, y, z,'x1','y1','z1']
c 3 6 [x1, y1, z1,'x3','y3','z3']
out:
ValueError: Must have equal len keys and value when setting with an iterable
is there any correct way to do this?
CodePudding user response:
Does this help. apply
might not be the best approach depending on the size of df
import pandas as pd
df = pd.DataFrame({'col1':[1,2,3],'col2':[4,5,6], 'col3':[[],['x','y','z'],['x1','y1','z1']]}, index=['a','b','c'])
df['col3'] = df['col3'].apply(lambda x: list(set( x ['x3','y3','z3'])))
print(df)
col1 col2 col3
a 1 4 [y3, z3, x3]
b 2 5 [x, y3, z3, z, x3, y]
c 3 6 [z1, y3, z3, y1, x1, x3]
CodePudding user response:
You could use apply
with help of a set
difference:
S = set(list_to_append)
df['col3'] = df['col3'].apply(lambda x: x list(S.difference(x)))
output (with a new column for clarity):
col1 col2 col3 col4
a 1 4 [] [z3, y3, x3]
b 2 5 [x, y, z] [x, y, z, z3, y3, x3]
c 3 6 [x1, y1, z1] [x1, y1, z1, z3, y3, x3]
For a variable input use a Series:
s = pd.Series([list((f'x{n}',f'y{n}',f'z{n}')) for n in range(len(df))],
index=['b','a','c'])
df['col4'] = (df['col3'] s).apply(lambda x: list(dict.fromkeys(x)))
output:
col1 col2 col3 col4
a 1 4 [] [x1, y1, z1]
b 2 5 [x, y, z] [x, y, z, x0, y0, z0]
c 3 6 [x1, y1, z1] [x1, y1, z1, x2, y2, z2]