Home > Enterprise >  Sort table rows by column values in R
Sort table rows by column values in R

Time:03-31

I have a classic output of the BLAST tool that it is like the table below. To make the table easier to read, I reduced the number of columns.

query subject startinsubject endinsubject
1 SRR 50 100
1 SRR 500 450

What I would need would be to create another column, called "strand", where when the query is forward as in the first row, and therefore the startinsubject is less than the endinsubject, writes in the new column F. On the other hand, when the query is in reverse, as in the second row, where the startinsubject is higher than the endinsubject, it adds an R in the new "strand" column.

I would like to get a new table like this one below. Could anyone help me? a thousand thanks

query subject startinsubject endinsubject strand
1 SRR 50 100 F
1 SRR 500 450 R

CodePudding user response:

This is an ifelse option. You can use the following code:

df <- data.frame(query = c(1,1),
                 subject = c("SRR", "SRR"),
                 startinsubject = c(50, 500),
                 endinsubject = c(100, 450))

library(dplyr)

df %>%
  mutate(strand = ifelse(startinsubject > endinsubject, "R", "F"))

Output:

  query subject startinsubject endinsubject strand
1     1     SRR             50          100      F
2     1     SRR            500          450      R

CodePudding user response:

We may either use ifelse/case_when or just convert the logical to numeric index for replacement

library(dplyr)
df1 <- df1 %>% 
   mutate(strand =  c("R", "F")[1   (startinsubject < endinsubject)])

-output

df1
 query subject startinsubject endinsubject strand
1     1     SRR             50          100      F
2     1     SRR            500          450      R

data

df1 <- structure(list(query = c(1L, 1L), subject = c("SRR", "SRR"), 
    startinsubject = c(50L, 500L), endinsubject = c(100L, 450L
    )), class = "data.frame", row.names = c(NA, -2L))
  • Related