Home > OS >  Replace duplicates with values differing by n 1 in the last digit [R]
Replace duplicates with values differing by n 1 in the last digit [R]

Time:09-07

I would like to find whether the certain column (spec) in my df contains duplicates, and if it does - replace their values, so the last digit of n 1 occurrence of the duplicate would be greater by 1 from the original value.

Here is a dummy example:

sens <- c(1.0000000, 0.9968220, 0.1302966,0.1197034, 0.0000000)
spec <- c(0.0000000, 0.9978812, 0.9978812,0.9978812, 1.0000000)
df <- data.frame(sens, spec)

And here is my desired output:

sens <- c(1.0000000, 0.9968220, 0.1302966,0.1197034, 0.0000000)
spec <- c(0.0000000, 0.9978812, 0.9978813,0.9978814, 1.0000000)
out <- data.frame(sens, spec)

Tried this, but it does not produce the output I want:

df2 <- within(df, spec <- ifelse(duplicated(spec), spec 0.0000001, spec))

Would appreciate help. Sorry for my English, did my best to explain.

CodePudding user response:

We could use cumsum per group:

library(dplyr)

df |>
  group_by(spec) |>
  mutate(new_spec = spec cumsum(duplicated(spec))*0.0000001) |>
  ungroup()

Output:

       sens      spec  new_spec
1 1.0000000 0.0000000 0.0000000
2 0.9968220 0.9978812 0.9978812
3 0.1302966 0.9978812 0.9978813
4 0.1197034 0.9978812 0.9978814
5 0.0000000 1.0000000 1.0000000

CodePudding user response:

Using data.table

library(data.table)
setDT(df)[, spec2 := spec   ((seq_len(.N)-1) * 0.0000001), spec]

-output

> df
        sens      spec     spec2
       <num>     <num>     <num>
1: 1.0000000 0.0000000 0.0000000
2: 0.9968220 0.9978812 0.9978812
3: 0.1302966 0.9978812 0.9978813
4: 0.1197034 0.9978812 0.9978814
5: 0.0000000 1.0000000 1.0000000
  • Related