I would like to find whether the certain column (spec) in my df contains duplicates, and if it does - replace their values, so the last digit of n 1 occurrence of the duplicate would be greater by 1 from the original value.
Here is a dummy example:
sens <- c(1.0000000, 0.9968220, 0.1302966,0.1197034, 0.0000000)
spec <- c(0.0000000, 0.9978812, 0.9978812,0.9978812, 1.0000000)
df <- data.frame(sens, spec)
And here is my desired output:
sens <- c(1.0000000, 0.9968220, 0.1302966,0.1197034, 0.0000000)
spec <- c(0.0000000, 0.9978812, 0.9978813,0.9978814, 1.0000000)
out <- data.frame(sens, spec)
Tried this, but it does not produce the output I want:
df2 <- within(df, spec <- ifelse(duplicated(spec), spec 0.0000001, spec))
Would appreciate help. Sorry for my English, did my best to explain.
CodePudding user response:
We could use cumsum
per group:
library(dplyr)
df |>
group_by(spec) |>
mutate(new_spec = spec cumsum(duplicated(spec))*0.0000001) |>
ungroup()
Output:
sens spec new_spec
1 1.0000000 0.0000000 0.0000000
2 0.9968220 0.9978812 0.9978812
3 0.1302966 0.9978812 0.9978813
4 0.1197034 0.9978812 0.9978814
5 0.0000000 1.0000000 1.0000000
CodePudding user response:
Using data.table
library(data.table)
setDT(df)[, spec2 := spec ((seq_len(.N)-1) * 0.0000001), spec]
-output
> df
sens spec spec2
<num> <num> <num>
1: 1.0000000 0.0000000 0.0000000
2: 0.9968220 0.9978812 0.9978812
3: 0.1302966 0.9978812 0.9978813
4: 0.1197034 0.9978812 0.9978814
5: 0.0000000 1.0000000 1.0000000