Sort table rows by column values in R-CodePudding

I have a classic output of the BLAST tool that it is like the table below. To make the table easier to read, I reduced the number of columns.

query	subject	startinsubject	endinsubject
1	SRR	50	100
1	SRR	500	450

What I would need would be to create another column, called "strand", where when the query is forward as in the first row, and therefore the startinsubject is less than the endinsubject, writes in the new column F. On the other hand, when the query is in reverse, as in the second row, where the startinsubject is higher than the endinsubject, it adds an R in the new "strand" column.

I would like to get a new table like this one below. Could anyone help me? a thousand thanks

query	subject	startinsubject	endinsubject	strand
1	SRR	50	100	F
1	SRR	500	450	R

CodePudding user response：

This is an ifelse option. You can use the following code:

df <- data.frame(query = c(1,1),
                 subject = c("SRR", "SRR"),
                 startinsubject = c(50, 500),
                 endinsubject = c(100, 450))

library(dplyr)

df %>%
  mutate(strand = ifelse(startinsubject > endinsubject, "R", "F"))

Output:

  query subject startinsubject endinsubject strand
1     1     SRR             50          100      F
2     1     SRR            500          450      R

CodePudding user response：

We may either use ifelse/case_when or just convert the logical to numeric index for replacement

library(dplyr)
df1 <- df1 %>% 
   mutate(strand =  c("R", "F")[1   (startinsubject < endinsubject)])

-output

df1
 query subject startinsubject endinsubject strand
1     1     SRR             50          100      F
2     1     SRR            500          450      R

data

df1 <- structure(list(query = c(1L, 1L), subject = c("SRR", "SRR"), 
    startinsubject = c(50L, 500L), endinsubject = c(100L, 450L
    )), class = "data.frame", row.names = c(NA, -2L))