Home > Software engineering >  R looking up values within a range and assigning annotation
R looking up values within a range and assigning annotation

Time:11-19

I have mass spec data that I need help annotating. I have two files loaded. File1 has two columns (mz, intensity) and File2 has two as well (mz, name). In both files, all of the columns are numeric values, except for name that's characters. I need to take the mz value in File1 and match against the mz values in File2 within /- 0.001. If a value falls within that range in File1, I need to annotate with the 'name' value in File2. Below is an example:

File1

mz intensity
100.1234 1234
134.5678 7653
150.1234 23463
176.5678 12354

File2

mz name
100.1225 name1
112.5678 name2
150.1239 name3
176.5665 name4

the idea is to get an output like this:

mz intensity name
100.1234 1234 name1
134.5678 7653
150.1234 23463
176.5678 12354 name4

I tried using mutate and merge, but I'm not sure how to add the number range and use a conditional statement to make it work. I also tried data.table, but again, not sure how to adjust for a range.

CodePudding user response:

library(dplyr)

## The example data.
file1 <- tibble::tribble(
  ~mz, ~intensity,
  100.1234, 1234,
  134.5678, 7653,
  150.1234, 23463,
  176.5678, 12354,
)
file2 <- tibble::tribble(
  ~mz, ~name,
  100.1225, "name1",
  112.5678, "name2",
  150.1239, "name3",
  176.5665, "name4",
)

out <-
  file1 %>%
  mutate(approx = round(mz, 3)) %>%
  left_join(file2 %>% mutate(approx = round(mz, 3)) %>% select(-mz),
            by = "approx") %>%
  select(-approx)

## Note that the output differs because ?round follows IEC 60559, IEEE 754

CodePudding user response:

suppose you have two data.frames file1 and file2:

With dplyr you can do::

library(dplyr)
file1 %>% rowwise() %>% mutate(
      name = if(any(abs(file2$mz - mz)<0.001)) 
        file2$name[min(which(abs(file2$mz - mz)<0.001))] else "")
  • Related