I am currently working on a data frame in pandas named df
. One column contains
multiple labels (more than 100, to be exact).
I know how to replace values when there are a smaller amount of values.
For instance, in the typical Titanic example:
titanic.Sex.replace({'male': 0,'female': 1}, inplace=True)
Of course, doing so for 100 values would be extremely time-consuming. I have seen similar questions, but all answers involve typing the data. Is there a faster way to do this?
CodePudding user response:
I think you're looking for factorize
:
df = pd.DataFrame({'col': list('ABCDEBJZACA')})
df['factor'] = df['col'].factorize()[0]
output:
col factor
0 A 0
1 B 1
2 D 2
3 C 3
4 E 4
5 B 1
6 J 5
7 Z 6
8 A 0
9 C 3
10 A 0