I am working on a toy dataset. My dataset has 3 columns and 9 rows. Every column has some categorical values. I try to replace those categorical values with numerical numbers.
I am using pandas for the operation
Code block
Instance_data
Q1 Q3 Q25
2 '14 years old' 'Ungraded or other grade' No
3 '13 years old' 'Ungraded or other grade' No
4 '14 years old' 'Ungraded or other grade' No
5 '15 years old' 'Ungraded or other grade' No
6 '15 years old' 'Ungraded or other grade' No
7 '14 years old' 'Ungraded or other grade' No
8 '14 years old' 'Ungraded or other grade' No
9 '14 years old' 'Ungraded or other grade' No
10 '15 years old' 'Ungraded or other grade' No
Instance_data['Q1'].replace({
'13 years old': 1,
'14 years old': 2,
'15 years old' : 3,
}, inplace=True)
The name of the dataset is Instance_data.
The output of the above query is
Q1 Q3 Q25
2 '14 years old' 'Ungraded or other grade' No
3 '13 years old' 'Ungraded or other grade' No
4 '14 years old' 'Ungraded or other grade' No
5 '15 years old' 'Ungraded or other grade' No
6 '15 years old' 'Ungraded or other grade' No
7 '14 years old' 'Ungraded or other grade' No
8 '14 years old' 'Ungraded or other grade' No
9 '14 years old' 'Ungraded or other grade' No
10 '15 years old' 'Ungraded or other grade' No
I wonder why Q1 not changed is 1,2,3?
CodePudding user response:
You have to use double quotes because your strings contain simple quotes:
Instance_data['Q1'].replace({
"'13 years old'": 1,
"'14 years old'": 2,
"'15 years old'" : 3,
}, inplace=True)
print(Instance_data)
# Output:
Q1 Q3 Q25
2 2 'Ungraded or other grade' No
3 1 'Ungraded or other grade' No
4 2 'Ungraded or other grade' No
5 3 'Ungraded or other grade' No
6 3 'Ungraded or other grade' No
7 2 'Ungraded or other grade' No
8 2 'Ungraded or other grade' No
9 2 'Ungraded or other grade' No
10 3 'Ungraded or other grade' No
Or you can use pd.factorize
(but not the same result)
Instance_data['Q1'] = pd.factorize(Instance_data['Q1'])[0]
print(Instance_data)
# Output:
Q1 Q3 Q25
2 0 'Ungraded or other grade' No
3 1 'Ungraded or other grade' No
4 0 'Ungraded or other grade' No
5 2 'Ungraded or other grade' No
6 2 'Ungraded or other grade' No
7 0 'Ungraded or other grade' No
8 0 'Ungraded or other grade' No
9 0 'Ungraded or other grade' No
10 2 'Ungraded or other grade' No