I need help with deleting duplicated elements language
columns that appears more than one time using python.
Here is my csv:
f = pd.DataFrame({'Movie': ['name1','name2','name3','name4'],
'Year': ['1905', '1905','1906','1907'],
'Id': ['tt0283985', 'tt0283986','tt0284043','tt3402904'],
'language':['Mandarin,Mandarin','Mandarin,Cantonese,Mandarin','Mandarin,Cantonese','Cantonese,Cantonese']})
Where f
now looks like:
Movie Year Id language
0 name1 1905 tt0283985 Mandarin,Mandarin
1 name2 1905 tt0283986 Mandarin,Cantonese,Mandarin
2 name3 1906 tt0284043 Mandarin,Cantonese
3 name4 1907 tt3402904 Cantonese,Cantonese
And the result should be like this:
Movie Year Id language
0 name1 1905 tt0283985 Mandarin
1 name2 1905 tt0283986 Mandarin,Cantonese
2 name3 1906 tt0284043 Mandarin,Cantonese
3 name4 1907 tt3402904 Cantonese
I am having trouble with writing a function to delete complicated values in language columns. Thanks in advance!
CodePudding user response:
Try this:
f['language'].str.split(',').map(lambda x: ','.join(set(x)))
Output:
0 Mandarin
1 Mandarin,Cantonese
2 Mandarin,Cantonese
3 Cantonese