I asked this question for R, but now trying to do the same in Python.
I have a dataframe with 10000 rows.
Author Value
aaa 111
aaa 112
bbb 156
bbb 165
ccc 543
ccc 256
Each author has 4 rows, so I have 2500 authors.
I would like to substitute all strings into numeric values. Ideally with tidyverse
.
Expected output
Author Value
1 111
1 112
2 156
2 165
3 543
3 256
---------
2500 451
2500 234
Thanks!
CodePudding user response:
Use pd.factorize()
:
df['Author'] = pd.factorize(df['Author'])[0] 1
CodePudding user response:
Another way, cumsum the boolean values of consecutive column values
df['Author'] = (df['Author']!=df['Author'].shift()).cumsum()