I have a dataframe which has 10k movie names and 40k actor names.
The reason is I'm trying to make a graph from nx but the graphic becomes unreadable because of the names of the actor. So I want to change their names to numbers. Some of these actors played on multiple movies which means they are exists more than once. I want to change all these actors to numbers like 'Leslie Howard' = '1' and so on. I tried some loops and lists but I failed. I want to make a dictionary to be able to check which number was which actor. Can you help me?
CodePudding user response:
You could get all unique names of the column, generate a dictionary and then use map
to change the values to the numbers. At the same time you have the dictionary to check to which actor the number refers.
all_names = df['Actor_Name'].unique()
dic = dict((v,k) for k,v in enumerate(all_names))
df['Actor_Name'] = df['Actor_Name'].map(dic)
CodePudding user response:
You can just do factorize
df['Movie_name'] = df['Movie_name'].factorize()[0]
df['Actor_name'] = df['Actor_name'].factorize()[0]
CodePudding user response:
Convert the column into type category and get their unique values with .cat.codes
:
df['Actor_Name'] = df['Actor_Name'].astype('category').cat.codes