Hello I have a datagram on the following format:
set.seed(42)
df = data_frame(contigs = sprintf("k141_%s",floor(runif(100, min = 20, max = 200))),
start = floor(runif(100, min = 100, max = 115)),
end = floor(runif(100, min = 800, max = 830)))
df
[![image_1][1]][1]
*Sorry I don't know how to put correctly the df output.
The issue is that I want to retrieve the start and end values for each row so that they form a unique column named "ranges"
this is the desired output
[![enter image description here][2]][2]
where Rle values in this case are the contigs column on my example data frame df
I think that working with dpkyr may do the trick but im not sure how [1]: https://i.stack.imgur.com/h0Ga6.png [2]: https://i.stack.imgur.com/gf75k.png
CodePudding user response:
If I understand your question correctly, dplyr
would do the trick.
library(dplyr)
df %>%
rowwise() %>%
mutate(ranges = end-start)
CodePudding user response:
You can use IRanges::IRanges()
df %>%
group_by(contigs) %>%
summarize(range=list(IRanges(start,end)))
Output:
# A tibble: 79 × 2
contigs range
<chr> <list>
1 k141_100 <IRanges>
2 k141_102 <IRanges>
3 k141_103 <IRanges>
4 k141_105 <IRanges>
5 k141_106 <IRanges>
6 k141_112 <IRanges>
7 k141_113 <IRanges>
8 k141_120 <IRanges>
9 k141_121 <IRanges>
10 k141_124 <IRanges>
# … with 69 more rows
Notice that there is an IRanges object for each contig. If a contig in the original frame had X rows, then the IRanges object for that contig would contain X ranges. For example, contig "k141_57" has the following rows in the original frame:
contigs start end
<chr> <dbl> <dbl>
1 k141_57 101 826
2 k141_57 101 809
The range
column in the summarized frame has the following value for the row contig="k141_57":
IRanges object with 2 ranges and 0 metadata columns:
start end width
<integer> <integer> <integer>
[1] 101 826 726
[2] 101 809 709