I would like to identify duplicates and then add the sequential number before the first character. In the script below i identified the duplicates
I have a dataset that looks like this
col|
X123
X123
X456
X789
X890
X142
X142
X142
df$col<- ifelse(duplicated(df[,c("col")])|duplicated(df[,c("col")],fromLast = TRUE),
make.unique(df$col),df$col)
What my script ends up doing is this
col|
X123
X123.1
X456
X789
X890
X142
X142.1
X142.2
What I would like for it to do is
col|
1X123
2X123
X456
X789
X890
1X142
2X142
3X142
CodePudding user response:
1) Define a function which prepends sequence numbers and then use it with ave.
add_seq <- function(x) if (length(x) == 1) x else paste0(seq_along(x), x)
transform(DF, col = ave(col, col, FUN = add_seq))
giving:
col
1 1X123
2 2X123
3 X456
4 X789
5 X890
6 1X142
7 2X142
8 3X142
2) A variation which uses the idea of incorporating duplicated, as in the question, is the following. It gives the same result.
transform(DF, col = (duplicated(col) | duplicated(col, fromLast = TRUE)) |>
ifelse(ave(col, col, FUN = seq_along), "") |>
paste0(col))
Note
Lines <- "col
X123
X123
X456
X789
X890
X142
X142
X142"
DF <- read.table(text = Lines, header = TRUE, strip.white = TRUE)
CodePudding user response:
This uses data.table. We first add two columns by reference, id
, which holds the row number per group, and N
which holds the total number of rows per group. We then use an if-else statement (using data.table::fifelse
) to paste the row_number to the colum if the total number of rows is more than 1. We do this by row. The final line drops the temp id
and N
columns
library(data.table)
setDT(df)[, `:=`(id=1:.N, N=.N), by=col] %>%
.[,col:=fifelse(N>1,paste0(id,col),col), by=1:nrow(df)] %>%
.[,`:=`(id=NULL, N=NULL)]
col
<char>
1: 1X123
2: 2X123
3: X456
4: X789
5: X890
6: 1X142
7: 2X142
8: 3X142