I would like to format a string with authors' names.
Daenerys Targaryen
to TARGARYEN, D.
George R. R. Martin
to MARTIN, G. G. R.
Luís Inácio Lula da Silva
to SILVA, L. I. L. da
The pattern is
LAST, 1st. 2nd. 3rd. ...
.
It would be awesome if it's possible to format multiple names in one string, like
Daenerys Targaryen, Luís Inácio Lula da Silva
to TARGARYEN, D.; SILVA, L. I. L. da
CodePudding user response:
Solution using gsub()
with capture groups, and "\\U...\\E"
flags to capitalize last names.
library(magrittr)
x <- c("Daenerys Targaryen, George R. R. Martin, Luís Inácio Lula da Silva")
x %>%
strsplit(", ") %>%
unlist() %>%
gsub("(.*?) (\\w $)", "\\U\\2\\E, \\1", ., perl = TRUE) %>%
gsub(" ([A-Z])\\w*\\.?", " \\1.", .) %>%
paste(collapse = "; ")
# [1] "TARGARYEN, D.; MARTIN, G. R. R.; SILVA, L. I. L. da"
CodePudding user response:
Here is a function coded in base R that processes a strings and outputs the expected result.
fun <- function(x) {
y <- strsplit(x, " ")
sapply(y, \(s) {
if(any(nchar(s) == 0L))
s <- s[nchar(s) > 0L]
if(all(nchar(s))) {
n <- length(s)
out <- character(n)
out[1L] <- toupper(s[n])
if(n > 1L)
out[1L] <- paste0(out[1L], ",")
first <- substr(s[seq.int(n)[-n]], 1L, 1L)
i <- first == toupper(first)
out[-1L][i] <- paste0(first[i], ".")
out[-1L][!i] <- s[!i]
paste(out, collapse = " ")
} else ""
})
}
x <- c("Daenerys Targaryen",
"George R. R. Martin",
"Luís Inácio Lula da Silva")
fun(x)
#> [1] "TARGARYEN, D." "MARTIN, G. R. R." "SILVA, L. I. L. da"
Created on 2022-11-19 with reprex v2.0.2
Edit
To process a string with several names and output one string, do it in two steps.
y <- c("Daenerys Targaryen, Luís Inácio Lula da Silva")
ll <- lapply(strsplit(y, ", "), fun)
do.call(\(x) paste(x, collapse = "; "), ll)
#> [1] "TARGARYEN, D.; SILVA, L. I. L. da"
Created on 2022-11-19 with reprex v2.0.2