Home > OS >  change the values in a column of a dataframe using another dataframe that contains key pairs
change the values in a column of a dataframe using another dataframe that contains key pairs

Time:10-24

I have the following sample df

df <- data.frame(old=c("412oldname1","32oldname132","oldname141","132oldname2","oldname3","oldname4","oldname5"))

        old
1   412oldname1
2   32oldname132
3   oldname141
4   132oldname2
5   oldname3
6   oldname4
7   oldname5

I would like to extract rows using grepl and replace.

keypairs <- data.frame(old=c("oldname1","oldname2"),new=c("newname1","newname2"))

       old      new
1 oldname1 newname1
2 oldname2 newname2

The desired dataframe would be:

        old
1   newname1
2   newname1
3   newname1
4   newname2

Ideally, I could replace the output of grepl with the patters, then use left_join and select the new column.

So far, I have:

df %>% filter(grepl(keypairs$old,collapse="|"))
# still trying to figure out what to do at this step
%>% left_join(keypairs) %>% select(new)

CodePudding user response:

There might be a more efficient way to do it, but I have used something like this before.

library(dplyr)
library(purrr)
library(stringr)

str_grab <- function(string, pattern) pattern[str_which(string, coll(pattern))[1]]

Use this to extract out the standardized names.

df %>% 
  mutate(clean = map_chr(old, str_grab, keypairs$old)) %>% 
  left_join(keypairs, by = c(clean = "old"))
#            old    clean      new
# 1  412oldname1 oldname1 newname1
# 2 32oldname132 oldname1 newname1
# 3   oldname141 oldname1 newname1
# 4  132oldname2 oldname2 newname2
# 5     oldname3     <NA>     <NA>
# 6     oldname4     <NA>     <NA>
# 7     oldname5     <NA>     <NA>

If you only want to keep ones that match, use inner_join() instead of left_join(). Then you can use basic dplyr commands to rename and select the column you want.

df %>% 
  mutate(clean = map_chr(old, str_grab, keypairs$old)) %>% 
  inner_join(keypairs, by = c(clean = "old")) %>% 
  select(old = new)
#        old
# 1 newname1
# 2 newname1
# 3 newname1
# 4 newname2

CodePudding user response:

A similiar approach like Adam's:

library(dplyr)
library(stringr)
library(tibble)
# or simply
# library(tidyverse) 

df %>% 
  mutate(new = deframe(keypairs)[str_extract(old, paste(keypairs$old, collapse = "|"))])

This returns

           old      new
1  412oldname1 newname1
2 32oldname132 newname1
3   oldname141 newname1
4  132oldname2 newname2
5     oldname3     <NA>
6     oldname4     <NA>
7     oldname5     <NA>

What happens here?

  • deframe(keypairs) creates a "lookup" vector. For example: if we put "oldname1" into it, "newname1" is returned.
  • str_extract extracts the input for the lookup. We search for the right "oldname" in column old.
  • Related