Home > Software engineering >  Python - Associate Numerical value to Categorical value
Python - Associate Numerical value to Categorical value

Time:03-30

I have an excel file with a column like this:

enter image description here

Each level is assigned an numerical value where beginner = 1, intermediate = 4, advanced = 10, genius = 20, insane = 50.

Is there a way to associate the levels numerical value with its categorical value in a pandas data frame without changing it?

I know that I can just add another column but I was curious if there was such way to do this type of association so the dataframe will display the levels name "Beginner, Intermediate.." but if I want to use the numerical value for data analysis I can call the cell and it will recognise its numerical value.

CodePudding user response:

You can replace the values with replace method.

df = pd.read_csv('my.csv')
 
# replacing values
df['Level'].replace(['beginner', 'intermediate', 'advanced', 'genius', 'insane' ],
                        [1, 4, 10, 20, 50], inplace=True)

But since you don't want to change the dataframe, you can just map the values and get copy of dataframe and perform your analysis.


# mapping values
dict_map = {"beginner":1,
            "intermediate":4,
            "advanced":10,
            "genius":20,
            "insane":50}

copy_df = df['Level'].map(dict_map)

CodePudding user response:

Create a dictionary and use the values in your analysis:

import numpy as np
import pandas as pd

num_dict = {'beginner': 1, 'intermediate': 4, 'advanced': 10, 'genius':20, 
            'insane': 50}
test_data = np.vstack([[key, 1, 3] for key in num_dict.keys()])
test_data
array([['beginner', '1', '3'],
       ['intermediate', '1', '3'],
       ['advanced', '1', '3'],
       ['genius', '1', '3'],
       ['insane', '1', '3']], dtype='<U21')
df = pd.DataFrame(test_data, columns=['wanted', 'e', 'i'], index=range(len(num_dict.keys())))
df
wanted e i
0 beginner 1 3
1 intermediate 1 3
2 advanced 1 3
3 genius 1 3
4 insane 1 3
num_dict[df['wanted'][0]]

1

or use map() to along the whole column...

df['wanted'].map(num_dict)
wanted
0 1
1 4
2 10
3 20
4 50
  • Related