I want a python code to obtain the unique genres-CodePudding

I only want to obtain the unique genres of the movies, thanks Data base shape Link to the data base:

CodePudding user response：

Convert strings repr to lists and then use Series.explode with Series.unique:

import ast

L = df['genres'].apply(ast.literal_eval).explode().unique().tolist()

Alternative:

L = df['genres'].str.strip('[]').str.split(',\s ').explode().unique().tolist()

CodePudding user response：

pandas may be overkill if you're not going to do more with this information~

from ast import literal_eval
from itertools import chain
import csv

with open('genres.csv') as f:
    reader = csv.reader(f)
    header = next(reader)
    values = [literal_eval(x[0]) for x in reader]

unique = set(chain.from_iterable(values))
print(unique)

Output:

{'action',
 'animation',
 'comedy',
 'crime',
 'documentation',
 'drama',
 'european',
 'family',
 'fantasy',
 'history',
 'horror',
 'music',
 'reality',
 'romance',
 'scifi',
 'sport',
 'thriller',
 'war',
 'western'}