Home > Software engineering >  extract valus of another dataframe if value of one column is partially match in R
extract valus of another dataframe if value of one column is partially match in R

Time:10-01

Sorry I didn't clarify my question, my aim is if dt$id %in% df$id , extract df$score add to new column at dt,

I have a dataframe like this :

df <- tibble(
score = c(2587,002,885,901,2587,3371,3372,002),
id = c("AR01.0","AR01.1","AR01.12","ERS02.00","ERS02.01","ERS02.02","QR01","QR01.03"))

And I have another dataframe like

dt <- tibble(
id = c("AR01","QR01","KVC"),
city = c("AM", "Bis","CHB"))

I want to mutate a new column "score" I want to got output like below :

id city score
AR01 AM 2587/2/885
ERS02 Bis 901/3371
KVC CHB NA

or

id city score score2 score3
AR01 AM 2587 2 885
ERS02 Bis 901 3371 NA
KVC CHB NA NA NA

I tried to use ifelse to achieve but always got error, do any one can provide ideas? Thank you.

CodePudding user response:

A simple left_join (after mutateing id values in df) is required:

library(dplyr)
library(stringr)
left_join(df %>% mutate(id = str_extract(id, "[\\w] ")), dt, by = "id") %>%
  group_by(id) %>%
  summarise(across(city,first),
            score = paste(score, collapse = "/"))
# A tibble: 3 × 3
  id    city  score        
  <chr> <chr> <chr>        
1 AR01  AM    2587/2/885   
2 ERS02 NA    901/2587/3371
3 QR01  Bis   3372/2 

For the second solution you can use separate:

library(dyplr)
library(stringr)
library(tidyr)
left_join(df %>% mutate(id = str_extract(id, "[\\w] ")), dt, by = "id") %>%
  group_by(id) %>%
  summarise(across(city,first),
            score = paste(score, collapse = "/")) %>%
  separate(score,
          into = paste("score", 1:3),
          sep = "/" )
# A tibble: 3 × 5
  id    city  `score 1` `score 2` `score 3`
  <chr> <chr> <chr>     <chr>     <chr>    
1 AR01  AM    2587      2         885      
2 ERS02 NA    901       2587      3371     
3 QR01  Bis   3372      2         NA  

CodePudding user response:

You could create groups by extracting everything before the . using sub to group_by on and merge the rows with paste separated with / and right_join them by id like this:

library(tibble)
df <- tibble(
  score = c(2587,002,885,901,2587,3371,3372,002),
  id = c("AR01.0","AR01.1","AR01.12","ERS02.00","ERS02.01","ERS02.02","QR01","QR01.03"))

dt <- tibble(
  id = c("AR01","QR01","KVC"),
  city = c("AM", "Bis","CHB"))

library(dplyr)
df %>%
  mutate(id = sub('\\..*', "", id)) %>%
  group_by(id) %>%
  mutate(score = paste(score, collapse = '/')) %>%
  distinct(id, .keep_all = TRUE) %>%
  ungroup() %>%
  right_join(., dt, by = 'id')
#> # A tibble: 3 × 3
#>   score      id    city 
#>   <chr>      <chr> <chr>
#> 1 2587/2/885 AR01  AM   
#> 2 3372/2     QR01  Bis  
#> 3 <NA>       KVC   CHB

Created on 2022-10-01 with reprex v2.0.2

  •  Tags:  
  • r
  • Related