Home > Software engineering >  Replacing column values if they match one of two columns in another dataset
Replacing column values if they match one of two columns in another dataset

Time:11-26

I have example data as follows:

library(data.table)
dat1 <- fread("code1 code2 code3
              A3     B2   C1
              A4     B3   C2")

dat2 <- fread("codes
              A3  
              A4
              B2
              B3")

I would simply like to replace the codes in dat2 with code3 from dat1.

Desired output:

dat_out <- fread("codes
              C1  
              C2
              C1
              C2")

How should I do this?

CodePudding user response:

We can use melt along with join by on = ... if you are working with data.table, e.g,

dat2[
  melt(dat1, id.var = "code3"), 
  .(codes = code3),
  on = c(codes = "value")
]

which gives


> dat2[melt(dat1, id.var = "code3"), .(codes = code3), on = c(codes = "value")]
   codes
1:    C1
2:    C2
3:    C1
4:    C2

where melt(dat1, id.var = "code3") looks like

> melt(dat1, id.var = "code3")
   code3 variable value
1:    C1    code1    A3
2:    C2    code1    A4
3:    C1    code2    B2
4:    C2    code2    B3

CodePudding user response:

You can use match:

`%r%`<- function(x, y) replace(x %% y, x %% y == 0, y)
dat2[, codes := dat1$code3[match(dat2$codes, unlist(dat1)) %r% nrow(dat1)]]

#   codes
#1:    C1
#2:    C2
#3:    C1
#4:    C2

Explanation:

  • This uses match on the unlisted value of dat1 (you can also only focus on selected columns).
  • You can then use %% (modulo) to get the rest of the integer division by nrow(dat1). Basically, it restitutes the size of the columns (relist).
  • The problem is that it creates value of 0 where we need 2 (or nrow), hence the new operator.

CodePudding user response:

This could be one way to do it:

library(tidyverse)

dat2 %>% 
  inner_join(dat1 %>% pivot_longer(!code3), by = c('codes'='value')) %>%
  select(!name) %>%
  mutate(codes = coalesce(!!!rev(.))) %>%
  select(codes)

   codes
1:    C1
2:    C2
3:    C1
4:    C2

CodePudding user response:

An option with named vector

dat2[, codes := setNames(rep(dat1$code3, 2), unlist(dat1[, 1:2]))[codes]]

-output

> dat2
    codes
   <char>
1:     C1
2:     C2
3:     C1
4:     C2
  • Related