Home > Back-end >  Remove elements from lists of a list in one column from a list in another column and replace with ne
Remove elements from lists of a list in one column from a list in another column and replace with ne

Time:05-04

I have a dataframe (column del_lst has bool type ):

import pandas as pd

df = pd.DataFrame({'col1': [[['a1']], [['b1'], ['b2']], [['b1'], ['b2']], [['c1'], ['c2'], ['c3']], [['c1'], ['c2'], ['c3']], [['c1'], ['c2'], ['c3']]],
'col2': [['a1'], ['b1'], ['b2'], ['c1'], ['c2'], ['c3']],
'day': [18, 19, 19, 20, 20, 20],
'del_lst': [True, True, True , True, False, False]})
df

Output:

  col1                col2   day del_lst
0 [[a1]]                [a1]   18    True
1 [[b1], [b2]]        [b1]   19    True
2 [[b1], [b2]]        [b2]   19    True
3 [[c1], [c2], [c3]]  [c1]   20    True
4 [[c1], [c2], [c3]]  [c2]   20    False
5 [[c1], [c2], [c3]]  [c3]   20    False

I want to delete lists that have the True type, and delete them step by step. For example in [[b1],[b2]],b1 and b2 are True, so first delete b1, then b2. I did like this, but unfortunately my code doesn't work.

def func_del(df):
return list(set(df['col1']) - set(df['col2']))


def all_func(df):
# select only lines with True
df_tr = df[df['del_lst'] == True]
for i, row in df_tr.iterrows():
df_tr['new_col1'] = df_tr.apply(func_del, axis=1)

# I want to get a dictionary from where the key is column col1 and the value is new_col1
dict_replace = dict (zip(df_tr['col1'], df_tr['new_col1']))
# so that I replace the old values in the initial dataframe
df['col1_replaced'] = df['col1'].apply(lambda word: dict_replace.get(word, word))
return df

df_new = df.apply(all_func, axis=1)

I would like to have a dataframe like this at the end

   col1               col2  col1_replaced  day  del_lst
0 [[a1]]               [a1]   []             18     True
1 [[b1],[b2]]        [b1]   []             19     True
2 [[b1],[b2]]        [b2]   []             19     True
3 [[c1],[c2],[c3]]   [c1]   []             20     True
4 [[c1],[c2],[c3]]   [c2]   [[c2], [c3]]   20     False
5 [[c1],[c2],[c3]]   [c3]   [[c2], [c3]]   20     False

CodePudding user response:

You need to loop here, using set operations:

S = set(df.loc[df['del_lst'], 'col2'].str[0])


df['col1_replaced'] = [[x for x in l
                        if (x[0] if isinstance(x, list) else x) not in S]
                       for l in df['col1']]

NB I am assuming that you have either single or nested lists here, if not just use if x[0] not in S as condition

output:

                 col1  col2  day  del_lst col1_replaced
0                [a1]  [a1]   18     True            []
1        [[b1], [b2]]  [b1]   19     True            []
2        [[b1], [b2]]  [b2]   19     True            []
3  [[c1], [c2], [c3]]  [c1]   20     True  [[c2], [c3]]
4  [[c1], [c2], [c3]]  [c2]   20    False  [[c2], [c3]]
5  [[c1], [c2], [c3]]  [c3]   20    False  [[c2], [c3]]
  • Related