Home > Mobile >  Replacing multiple string values in a column with numbers in pandas
Replacing multiple string values in a column with numbers in pandas

Time:11-18

I am currently working on a data frame in pandas named df. One column contains multiple labels (more than 100, to be exact).

I know how to replace values when there are a smaller amount of values.

For instance, in the typical Titanic example:

titanic.Sex.replace({'male': 0,'female': 1}, inplace=True)

Of course, doing so for 100 values would be extremely time-consuming. I have seen similar questions, but all answers involve typing the data. Is there a faster way to do this?

CodePudding user response:

I think you're looking for factorize:

df = pd.DataFrame({'col': list('ABCDEBJZACA')})
df['factor'] = df['col'].factorize()[0]

output:

   col  factor
0    A       0
1    B       1
2    D       2
3    C       3
4    E       4
5    B       1
6    J       5
7    Z       6
8    A       0
9    C       3
10   A       0
  • Related