Home > front end >  Trying to create a new table column using reference table value under certain conditions
Trying to create a new table column using reference table value under certain conditions

Time:04-20

Right now I have two data tables in the form:

library(dplyr)

> lengths 
chrom    p      q
1        2560   78934
2        1346   9087
3        678    7809
4        5467   1253
...

> my_data
mut      pos       chr    arm
A        567       2      p
B        6890      2      q
C        978       3      q
D        4689      4      p
...

I want to add another column my_data$chrom.pos, that is conditionally adding values from lengths$q to the my_data$pos values IF my_data$arm is 'p', but not 'q'.

What I have thus far:

my_data %>% mutate(chrom.pos =
                       case_when (arm == "q" ~ pos,
                                  arm == "p" ~ pos   lengths[["q"]][chr])
  )

While this handles the 'q' condition just fine, any case that is 'p' comes out as NA because I'm not getting the linking quite right.

> my_data
mut      pos       chr    arm    chrom.pos
A        567       2      p      NA        # Looking for 9087   567 = 9654
B        6890      2      q      6890
C        978       3      q      978
D        4689      4      p      NA        # Looking for 1253   4689 = 5942

CodePudding user response:

When I am running your code, I get the following results:

lengths <- data.frame(chrom = c(1,2,3,4),
                      p = c(2560,1346,678,5467),
                      q = c(78934,9087,7809,1253))

my_data <- data.frame(mut = c("A", "B", "C", "D"),
                      pos = c(567,6890,978,4689),
                      chr = c(2,2,3,4),
                      arm = c("p", "q", "q", "p"))

library(tidyverse)    

my_data %>% mutate(chrom.pos =
                     case_when (arm == "q" ~ pos,
                                arm == "p" ~ pos   lengths[["q"]][chr])
)

Output:

  mut  pos chr arm chrom.pos
1   A  567   2   p      9654
2   B 6890   2   q      6890
3   C  978   3   q       978
4   D 4689   4   p      5942

As you can see, it gives the desired output.

CodePudding user response:

One reason this couldn't work could be because of leading/lagging spaces in the 'arm' column values. We can remove those spaces with trimws and it should work

library(dplyr)
my_data %>% 
     mutate(pos = trimws(pos), 
         chrom.pos =
                       case_when (arm == "q" ~ pos,
                                  arm == "p" ~ pos   lengths[["q"]][chr])
  )

CodePudding user response:

You can merge two data with chr and chrom as keys.

library(dplyr)

my_data %>%
  left_join(select(lengths, chrom, q), by = c("chr" = "chrom")) %>%
  mutate(chrom.pos = if_else(arm == "p", pos   q, pos)) %>%
  select(-q)

#   mut  pos chr arm chrom.pos
# 1   A  567   2   p      9654
# 2   B 6890   2   q      6890
# 3   C  978   3   q       978
# 4   D 4689   4   p      5942
  • Related