Home > Blockchain >  Substituting multiple repetitive strings in R dataframe with consecutive respective numeric values
Substituting multiple repetitive strings in R dataframe with consecutive respective numeric values

Time:02-28

I have a dataframe with 10000 rows.

Author  Value
aaa     111
aaa     112
bbb     156
bbb     165
ccc     543
ccc     256

Each author has 4 rows, so I have 2500 authors.

I would like to substitute all strings into numeric values. Ideally with tidyverse.

Expected output

Author  Value
1       111
1       112
2       156
2       165
3       543
3       256
---------
2500    451
2500    234

Thanks!

CodePudding user response:

Use match and unique:

match(dat$Author, unique(dat$Author))
# [1] 1 1 2 2 3 3

Reassign that back to the original column or a new one, your call.

If you want to put this in a dplyr pipe, then just

dat %>%
  mutate(Author = match(Author, unique(Author)))

(as akrun posted in their comment at the same time I was finishing this answer :-).


Data

dat <- structure(list(Author = c("aaa", "aaa", "bbb", "bbb", "ccc", "ccc"), Value = c(111L, 112L, 156L, 165L, 543L, 256L)), class = "data.frame", row.names = c(NA, -6L))
  • Related