Home > OS >  how to retrieve ranges per each row on R data frame
how to retrieve ranges per each row on R data frame

Time:06-02

Hello I have a datagram on the following format:

set.seed(42)
df = data_frame(contigs = sprintf("k141_%s",floor(runif(100, min = 20, max = 200))),
                             start = floor(runif(100, min = 100, max = 115)),
                             end = floor(runif(100, min = 800, max = 830)))

df

[![image_1][1]][1]

*Sorry I don't know how to put correctly the df output.

The issue is that I want to retrieve the start and end values for each row so that they form a unique column named "ranges"

this is the desired output

[![enter image description here][2]][2]

where Rle values in this case are the contigs column on my example data frame df

I think that working with dpkyr may do the trick but im not sure how [1]: https://i.stack.imgur.com/h0Ga6.png [2]: https://i.stack.imgur.com/gf75k.png

CodePudding user response:

If I understand your question correctly, dplyr would do the trick.

library(dplyr)

df %>% 
  rowwise() %>% 
  mutate(ranges = end-start)

CodePudding user response:

You can use IRanges::IRanges()

df %>%
  group_by(contigs) %>%
  summarize(range=list(IRanges(start,end)))

Output:

# A tibble: 79 × 2
   contigs  range    
   <chr>    <list>   
 1 k141_100 <IRanges>
 2 k141_102 <IRanges>
 3 k141_103 <IRanges>
 4 k141_105 <IRanges>
 5 k141_106 <IRanges>
 6 k141_112 <IRanges>
 7 k141_113 <IRanges>
 8 k141_120 <IRanges>
 9 k141_121 <IRanges>
10 k141_124 <IRanges>
# … with 69 more rows

Notice that there is an IRanges object for each contig. If a contig in the original frame had X rows, then the IRanges object for that contig would contain X ranges. For example, contig "k141_57" has the following rows in the original frame:

  contigs start   end
  <chr>   <dbl> <dbl>
1 k141_57   101   826
2 k141_57   101   809

The range column in the summarized frame has the following value for the row contig="k141_57":

IRanges object with 2 ranges and 0 metadata columns:
          start       end     width
      <integer> <integer> <integer>
  [1]       101       826       726
  [2]       101       809       709
  • Related