Derived from previous question here (Format author's name with stringr), I would like to edit a whole variable with different strings.
Previous solution doesn't work as it repeats all strings to each one.
library(dplyr)
x <- data.frame(
names = c("Daenerys Targaryen, George R. R. Martin, Luís Inácio Lula da Silva",
"Hadley Alexander Wickham, Joseph J. Allaire",
"Stack Overflow"
)
)
format_names <- function(variable) {
variable %>%
strsplit(", ") %>%
unlist() %>%
gsub("(.*?) (\\w $)", "\\U\\2\\E, \\1", ., perl = TRUE) %>%
gsub(" ([A-Z])\\w*\\.?", " \\1.", .) %>%
paste(collapse = "; ")
}
x %>%
mutate(new_names = format_names(names))
#> names
#> 1 Daenerys Targaryen, George R. R. Martin, Luís Inácio Lula da Silva
#> 2 Hadley Alexander Wickham, Joseph J. Allaire
#> 3 Stack Overflow
#> new_names
#> 1 TARGARYEN, D.; MARTIN, G. R. R.; SILVA, L. I. L. da; WICKHAM, H. A.; ALLAIRE, J. J.; OVERFLOW, S.
#> 2 TARGARYEN, D.; MARTIN, G. R. R.; SILVA, L. I. L. da; WICKHAM, H. A.; ALLAIRE, J. J.; OVERFLOW, S.
#> 3 TARGARYEN, D.; MARTIN, G. R. R.; SILVA, L. I. L. da; WICKHAM, H. A.; ALLAIRE, J. J.; OVERFLOW, S.
Created on 2022-11-21 with reprex v2.0.2
CodePudding user response:
You'll want to replace the unlist()
with something that preserves the groups. Here sapply
can help
format_names <- function(variable) {
variable %>%
strsplit(", ") %>%
sapply(. %>%
gsub("(.*?) (\\w $)", "\\U\\2\\E, \\1", ., perl = TRUE) %>%
gsub(" ([A-Z])\\w*\\.?", " \\1.", .) %>%
paste(collapse = "; "))
}
CodePudding user response:
One workaround is to make sure you are working by row. You can either use rowwise()
from dplyr
or groub_by(names)
. rowwise()
basically group by rows, so it is the same thing.
Solution with rowwise()
from dplyr
library(dplyr)
x %>%
rowwise() %>%
mutate(new_names = format_names(names))
Output
# A tibble: 3 × 2
# Rowwise:
names new_names
<chr> <chr>
1 Daenerys Targaryen, George R. R. Martin, Luís Inácio Lula da Silva TARGARYEN, D.; MARTIN, G. …
2 Hadley Alexander Wickham, Joseph J. Allaire WICKHAM, H. A.; ALLAIRE, J…
3 Stack Overflow OVERFLOW, S.