Home > OS >  pandas replace command unable to change categorical data to numerical data
pandas replace command unable to change categorical data to numerical data

Time:10-28

I am working on a toy dataset. My dataset has 3 columns and 9 rows. Every column has some categorical values. I try to replace those categorical values with numerical numbers.

I am using pandas for the operation

Code block

Instance_data
    Q1                  Q3                      Q25
2   '14 years old'  'Ungraded or other grade'   No
3   '13 years old'  'Ungraded or other grade'   No
4   '14 years old'  'Ungraded or other grade'   No
5   '15 years old'  'Ungraded or other grade'   No
6   '15 years old'  'Ungraded or other grade'   No
7   '14 years old'  'Ungraded or other grade'   No
8   '14 years old'  'Ungraded or other grade'   No
9   '14 years old'  'Ungraded or other grade'   No
10  '15 years old'  'Ungraded or other grade'   No

Instance_data['Q1'].replace({
                       '13 years old': 1,
                       '14 years old': 2,
                       '15 years old'   : 3,
                       }, inplace=True)

The name of the dataset is Instance_data.

The output of the above query is

     Q1                 Q3                      Q25
2   '14 years old'  'Ungraded or other grade'   No
3   '13 years old'  'Ungraded or other grade'   No
4   '14 years old'  'Ungraded or other grade'   No
5   '15 years old'  'Ungraded or other grade'   No
6   '15 years old'  'Ungraded or other grade'   No
7   '14 years old'  'Ungraded or other grade'   No
8   '14 years old'  'Ungraded or other grade'   No
9   '14 years old'  'Ungraded or other grade'   No
10  '15 years old'  'Ungraded or other grade'   No

I wonder why Q1 not changed is 1,2,3?

CodePudding user response:

You have to use double quotes because your strings contain simple quotes:

Instance_data['Q1'].replace({
                       "'13 years old'": 1,
                       "'14 years old'": 2,
                       "'15 years old'"   : 3,
                       }, inplace=True)
print(Instance_data)

# Output:
    Q1                         Q3 Q25
2    2  'Ungraded or other grade'  No
3    1  'Ungraded or other grade'  No
4    2  'Ungraded or other grade'  No
5    3  'Ungraded or other grade'  No
6    3  'Ungraded or other grade'  No
7    2  'Ungraded or other grade'  No
8    2  'Ungraded or other grade'  No
9    2  'Ungraded or other grade'  No
10   3  'Ungraded or other grade'  No

Or you can use pd.factorize (but not the same result)

Instance_data['Q1'] = pd.factorize(Instance_data['Q1'])[0]
print(Instance_data)

# Output:
    Q1                         Q3 Q25
2    0  'Ungraded or other grade'  No
3    1  'Ungraded or other grade'  No
4    0  'Ungraded or other grade'  No
5    2  'Ungraded or other grade'  No
6    2  'Ungraded or other grade'  No
7    0  'Ungraded or other grade'  No
8    0  'Ungraded or other grade'  No
9    0  'Ungraded or other grade'  No
10   2  'Ungraded or other grade'  No
  • Related