Home > Software design >  Remove objects which has been repeated in two columns in dataframe
Remove objects which has been repeated in two columns in dataframe

Time:09-21

I have a data frame like this: enter image description here

and the dataset in the CSV file is here.

this data was extracted from the IMDb dataset. but I have a problem, I could not be able to remove the actor's names which are repeated in the same row for example in row number 4 I want to drop 'Marie Gruber' in both name and actors column. I tried to use to apply and all conditions but always code consider it the same. like this code:

data[data['name'] != data['actors']]

CodePudding user response:

Trere are traling spaces for actors column, so first remove them by Series.str.strip:

data['actors'] = data['actors'].str.strip()
data[data['name'] != data['actors']]

Or use skipinitialspace=True in read_csv:

data = pd.read_csv(file, skipinitialspace=True)
data[data['name'] != data['actors']]

CodePudding user response:

Use pandas.dataframe.drop function.

data.drop(data[data.apply(lambda x: x['name'] in x['actors'], axis = 1)].index)
  • Related