I have a pandas df where one of the columns is made up of lists. I want to remove the values in that list that match another column in that same row. Please note sometimes the 'similar_ids' is empty or only has one value. Example is below:
original
ID similar_ids
1 1, 234, 3215
2 2, 52, 1
3 49, 3
4 4
5
desired
ID similar_ids
1 234, 3215
2 52, 1
3 49
4
5
CodePudding user response:
import pandas as pd
d = {'ID':[1, 2, 3, 4, 5], 'similar_ids':[[1, 234, 3215], [2, 52, 1], [49, 3], [4], []]}
df = pd.DataFrame(data=d)
for i in range(len(df['ID'])):
if df['ID'][i] in df['similar_ids'][i]:
df['similar_ids'][i].remove(df['ID'][i])
CodePudding user response:
df['similar_ids'] = df.apply(lambda row: [x for x in row.similar_ids if x != row.ID], axis=1)
print(df)
Output:
ID similar_ids
0 1 [234, 3215]
1 2 [52, 1]
2 3 [49]
3 4 []
4 5 []