Home > Software engineering >  Makes duplicate elements of a character vector unique, but not like make.unique()
Makes duplicate elements of a character vector unique, but not like make.unique()

Time:02-25

When multiple references have an identical author (or authors) and publication year, it is common practice to include a lowercase letter after the year. I am looking for an elegant function for that:

# what I have
have <- c("Dawkins (2008)",
          "Dawkins (2008)",
          "Stephenson (2008)")

# what I want
want <- c("Dawkins (2008a)",
          "Dawkins (2008b)",
          "Stephenson (2008)")

# this would do the job, but is not really what I want
make.unique(have)
#> [1] "Dawkins (2008)"    "Dawkins (2008).1"  "Stephenson (2008)"

Created on 2022-02-24 by the reprex package (v2.0.1)

edit: solution based on @akrun's answer below

library(dplyr)
library(stringr)

have <- c("Dawkins (2008)",
          "Dawkins (2008)",
          "Stephenson (2008)")

f <- function(x){
  v1 <- ave(x, x, FUN = function(x) if(length(x) > 1) letters[seq_along(x)] else "")
  stringr::str_replace(x, "\\)", stringr::str_c(v1, ")"))
}

data.frame(ha = have) %>% 
  mutate(want = f(ha))
#>                  ha              want
#> 1    Dawkins (2008)   Dawkins (2008a)
#> 2    Dawkins (2008)   Dawkins (2008b)
#> 3 Stephenson (2008) Stephenson (2008)

Created on 2022-02-24 by the reprex package (v2.0.1)

CodePudding user response:

We may get the letters (assuming the duplicates lengths won't be larger than 26) extracted based on the length of duplicates and then use str_replace to insert the letter before the closing )

library(stringr)
v1 <- ave(have, have, FUN = function(x) 
  if(length(x) > 1) letters[seq_along(x)] else "")
str_replace(have, "\\)", str_c(v1, ")"))
[1] "Dawkins (2008a)"   "Dawkins (2008b)"   "Stephenson (2008)"
  • Related