I only want to obtain the unique genres of the movies, thanks Data base shape Link to the data base:
CodePudding user response:
Convert strings repr to lists and then use Series.explode
with Series.unique
:
import ast
L = df['genres'].apply(ast.literal_eval).explode().unique().tolist()
Alternative:
L = df['genres'].str.strip('[]').str.split(',\s ').explode().unique().tolist()
CodePudding user response:
pandas
may be overkill if you're not going to do more with this information~
from ast import literal_eval
from itertools import chain
import csv
with open('genres.csv') as f:
reader = csv.reader(f)
header = next(reader)
values = [literal_eval(x[0]) for x in reader]
unique = set(chain.from_iterable(values))
print(unique)
Output:
{'action',
'animation',
'comedy',
'crime',
'documentation',
'drama',
'european',
'family',
'fantasy',
'history',
'horror',
'music',
'reality',
'romance',
'scifi',
'sport',
'thriller',
'war',
'western'}