I have an excel file with a column like this:
Each level is assigned an numerical value where beginner = 1, intermediate = 4, advanced = 10, genius = 20, insane = 50.
Is there a way to associate the levels numerical value with its categorical value in a pandas data frame without changing it?
I know that I can just add another column but I was curious if there was such way to do this type of association so the dataframe will display the levels name "Beginner, Intermediate.." but if I want to use the numerical value for data analysis I can call the cell and it will recognise its numerical value.
CodePudding user response:
You can replace the values with replace method.
df = pd.read_csv('my.csv')
# replacing values
df['Level'].replace(['beginner', 'intermediate', 'advanced', 'genius', 'insane' ],
[1, 4, 10, 20, 50], inplace=True)
But since you don't want to change the dataframe, you can just map the values and get copy of dataframe and perform your analysis.
# mapping values
dict_map = {"beginner":1,
"intermediate":4,
"advanced":10,
"genius":20,
"insane":50}
copy_df = df['Level'].map(dict_map)
CodePudding user response:
Create a dictionary and use the values in your analysis:
import numpy as np
import pandas as pd
num_dict = {'beginner': 1, 'intermediate': 4, 'advanced': 10, 'genius':20,
'insane': 50}
test_data = np.vstack([[key, 1, 3] for key in num_dict.keys()])
test_data
array([['beginner', '1', '3'],
['intermediate', '1', '3'],
['advanced', '1', '3'],
['genius', '1', '3'],
['insane', '1', '3']], dtype='<U21')
df = pd.DataFrame(test_data, columns=['wanted', 'e', 'i'], index=range(len(num_dict.keys())))
df
wanted | e | i | |
---|---|---|---|
0 | beginner | 1 | 3 |
1 | intermediate | 1 | 3 |
2 | advanced | 1 | 3 |
3 | genius | 1 | 3 |
4 | insane | 1 | 3 |
num_dict[df['wanted'][0]]
1
or use map()
to along the whole column...
df['wanted'].map(num_dict)
wanted | |
---|---|
0 | 1 |
1 | 4 |
2 | 10 |
3 | 20 |
4 | 50 |