Home > Enterprise >  Substituting multiple repetitive strings in pandas dataframe with consecutive respective numeric val
Substituting multiple repetitive strings in pandas dataframe with consecutive respective numeric val

Time:02-28

I asked this question for R, but now trying to do the same in Python.

I have a dataframe with 10000 rows.

Author  Value
aaa     111
aaa     112
bbb     156
bbb     165
ccc     543
ccc     256

Each author has 4 rows, so I have 2500 authors.

I would like to substitute all strings into numeric values. Ideally with tidyverse.

Expected output

Author  Value
1       111
1       112
2       156
2       165
3       543
3       256
---------
2500    451
2500    234

Thanks!

CodePudding user response:

Use pd.factorize():

df['Author'] = pd.factorize(df['Author'])[0]   1

CodePudding user response:

Another way, cumsum the boolean values of consecutive column values

df['Author'] = (df['Author']!=df['Author'].shift()).cumsum()
  • Related