I have the following sample df
df <- data.frame(old=c("412oldname1","32oldname132","oldname141","132oldname2","oldname3","oldname4","oldname5"))
old
1 412oldname1
2 32oldname132
3 oldname141
4 132oldname2
5 oldname3
6 oldname4
7 oldname5
I would like to extract rows using grepl and replace.
keypairs <- data.frame(old=c("oldname1","oldname2"),new=c("newname1","newname2"))
old new
1 oldname1 newname1
2 oldname2 newname2
The desired dataframe would be:
old
1 newname1
2 newname1
3 newname1
4 newname2
Ideally, I could replace the output of grepl with the patters, then use left_join
and select the new
column.
So far, I have:
df %>% filter(grepl(keypairs$old,collapse="|"))
# still trying to figure out what to do at this step
%>% left_join(keypairs) %>% select(new)
CodePudding user response:
There might be a more efficient way to do it, but I have used something like this before.
library(dplyr)
library(purrr)
library(stringr)
str_grab <- function(string, pattern) pattern[str_which(string, coll(pattern))[1]]
Use this to extract out the standardized names.
df %>%
mutate(clean = map_chr(old, str_grab, keypairs$old)) %>%
left_join(keypairs, by = c(clean = "old"))
# old clean new
# 1 412oldname1 oldname1 newname1
# 2 32oldname132 oldname1 newname1
# 3 oldname141 oldname1 newname1
# 4 132oldname2 oldname2 newname2
# 5 oldname3 <NA> <NA>
# 6 oldname4 <NA> <NA>
# 7 oldname5 <NA> <NA>
If you only want to keep ones that match, use inner_join()
instead of left_join()
. Then you can use basic dplyr
commands to rename and select the column you want.
df %>%
mutate(clean = map_chr(old, str_grab, keypairs$old)) %>%
inner_join(keypairs, by = c(clean = "old")) %>%
select(old = new)
# old
# 1 newname1
# 2 newname1
# 3 newname1
# 4 newname2
CodePudding user response:
A similiar approach like Adam's:
library(dplyr)
library(stringr)
library(tibble)
# or simply
# library(tidyverse)
df %>%
mutate(new = deframe(keypairs)[str_extract(old, paste(keypairs$old, collapse = "|"))])
This returns
old new
1 412oldname1 newname1
2 32oldname132 newname1
3 oldname141 newname1
4 132oldname2 newname2
5 oldname3 <NA>
6 oldname4 <NA>
7 oldname5 <NA>
What happens here?
deframe(keypairs)
creates a "lookup" vector. For example: if we put "oldname1" into it, "newname1" is returned.str_extract
extracts the input for the lookup. We search for the right "oldname" in columnold
.