I have a dataframe (column del_lst has bool type ):
import pandas as pd
df = pd.DataFrame({'col1': [[['a1']], [['b1'], ['b2']], [['b1'], ['b2']], [['c1'], ['c2'], ['c3']], [['c1'], ['c2'], ['c3']], [['c1'], ['c2'], ['c3']]],
'col2': [['a1'], ['b1'], ['b2'], ['c1'], ['c2'], ['c3']],
'day': [18, 19, 19, 20, 20, 20],
'del_lst': [True, True, True , True, False, False]})
df
Output:
col1 col2 day del_lst
0 [[a1]] [a1] 18 True
1 [[b1], [b2]] [b1] 19 True
2 [[b1], [b2]] [b2] 19 True
3 [[c1], [c2], [c3]] [c1] 20 True
4 [[c1], [c2], [c3]] [c2] 20 False
5 [[c1], [c2], [c3]] [c3] 20 False
I want to delete lists that have the True type, and delete them step by step. For example in [[b1],[b2]]
,b1
and b2
are True, so first delete b1
, then b2
. I did like this, but unfortunately my code doesn't work.
def func_del(df):
return list(set(df['col1']) - set(df['col2']))
def all_func(df):
# select only lines with True
df_tr = df[df['del_lst'] == True]
for i, row in df_tr.iterrows():
df_tr['new_col1'] = df_tr.apply(func_del, axis=1)
# I want to get a dictionary from where the key is column col1 and the value is new_col1
dict_replace = dict (zip(df_tr['col1'], df_tr['new_col1']))
# so that I replace the old values in the initial dataframe
df['col1_replaced'] = df['col1'].apply(lambda word: dict_replace.get(word, word))
return df
df_new = df.apply(all_func, axis=1)
I would like to have a dataframe like this at the end
col1 col2 col1_replaced day del_lst
0 [[a1]] [a1] [] 18 True
1 [[b1],[b2]] [b1] [] 19 True
2 [[b1],[b2]] [b2] [] 19 True
3 [[c1],[c2],[c3]] [c1] [] 20 True
4 [[c1],[c2],[c3]] [c2] [[c2], [c3]] 20 False
5 [[c1],[c2],[c3]] [c3] [[c2], [c3]] 20 False
CodePudding user response:
You need to loop here, using set
operations:
S = set(df.loc[df['del_lst'], 'col2'].str[0])
df['col1_replaced'] = [[x for x in l
if (x[0] if isinstance(x, list) else x) not in S]
for l in df['col1']]
NB I am assuming that you have either single or nested lists here, if not just use if x[0] not in S
as condition
output:
col1 col2 day del_lst col1_replaced
0 [a1] [a1] 18 True []
1 [[b1], [b2]] [b1] 19 True []
2 [[b1], [b2]] [b2] 19 True []
3 [[c1], [c2], [c3]] [c1] 20 True [[c2], [c3]]
4 [[c1], [c2], [c3]] [c2] 20 False [[c2], [c3]]
5 [[c1], [c2], [c3]] [c3] 20 False [[c2], [c3]]