I want to write a script in r that may reorganize data on my condition. I want to count number of contig
which is greater than 0
and sum num_reads
. Both on weekly (week
) basis. Please see the expected output
Here is the input data:
contig num_reads year week
h1 67 2012 7
h2 75 2012 7
h3 0 2012 7
h1 3 2012 8
h2 0 2012 8
h3 55 2012 8
h1 32 2012 9
h2 7 2012 9
h3 4 2013 9
h1 67 2013 7
h2 75 2013 7
h3 83 2013 7
h1 3 2013 8
h2 0 2013 8
h3 30 2013 8
h1 32 2013 9
h2 7 2013 9
h3 0 2013 9
h1 67 2014 7
h2 75 2014 7
h3 43 2014 7
h1 3 2014 8
h2 0 2014 8
h3 55 2014 8
h1 32 2014 9
h2 7 2014 9
h3 0 2014 9
Expected output data:
year week count_contig sum_num_reads
2012 7 2 142
2012 8 2 58
2012 9 3 43
and so on
CodePudding user response:
library(tidyverse)
df <- data.frame(
stringsAsFactors = FALSE,
contig = c("h1","h2","h3",
"h1","h2","h3","h1","h2","h3","h1","h2","h3",
"h1","h2","h3","h1","h2","h3","h1","h2",
"h3","h1","h2","h3","h1","h2","h3"),
num_reads = c(67L,75L,0L,3L,
0L,55L,32L,7L,4L,67L,75L,83L,3L,0L,30L,
32L,7L,0L,67L,75L,43L,3L,0L,55L,32L,7L,0L),
year = c(2012L,2012L,2012L,
2012L,2012L,2012L,2012L,2012L,2013L,2013L,
2013L,2013L,2013L,2013L,2013L,2013L,2013L,
2013L,2014L,2014L,2014L,2014L,2014L,2014L,
2014L,2014L,2014L),
week = c(7L,7L,7L,8L,8L,
8L,9L,9L,9L,7L,7L,7L,8L,8L,8L,9L,9L,9L,
7L,7L,7L,8L,8L,8L,9L,9L,9L)
)
df %>%
group_by(year, week) %>%
summarise(count_contig = sum(num_reads > 0),
sum_num_reads = sum(num_reads), .groups = "drop")
#> # A tibble: 9 × 4
#> year week count_contig sum_num_reads
#> <int> <int> <int> <int>
#> 1 2012 7 3 142
#> 2 2012 8 3 58
#> 3 2012 9 2 39
#> 4 2013 7 3 225
#> 5 2013 8 3 33
#> 6 2013 9 4 43
#> 7 2014 7 3 185
#> 8 2014 8 3 58
#> 9 2014 9 3 39