I want to combine two compound commands from package "dplyr" for simplicity.
this is a hypothetical dataset
V5 | V15 | sum | length | density |
---|---|---|---|---|
upstream | g1 | 1234 | 17645 | 0.1 |
upstream | g2 | 3456 | 17645 | 0.3 |
downstream | g1 | 2345 | 17645 | 0.2 |
downstream | g2 | 1456 | 17645 | 0.1 |
I first get the total length of each region:
df %>% dplyr::group_by(V5) %>%
dplyr::summarize(sum(sum)) %>%
ungroup()
then manually add it to a new column and extra:
df= df %>% mutate("region" = case_when(
str_detect(V5, "upstream") ~ "4690",
str_detect(V5, "downstream") ~ "3801",
))
df$Gsize <- (as.numeric(df$region)/14675549)*100
the function ungroup()
doesn't do what I expected, I want the summed value be added for all variables. how can I combine the first and second functions in a way that it automatically calculates each region's size, adds it to a new column so then I can get the percentage of it? it is tedious to be done manually for many regions and many tables.
expected result:
V5 | V15 | sum | length | density | region |
---|---|---|---|---|---|
upstream | g1 | 1234 | 17645 | 0.1 | 4690 |
upstream | g2 | 3456 | 17645 | 0.3 | 4690 |
downstream | g1 | 2345 | 17645 | 0.2 | 3801 |
downstream | g2 | 1456 | 17645 | 0.1 | 3801 |
CodePudding user response:
After computing the totals, join the totals with the original dataset. Then you can proceed with your percentage calculation.
library(dplyr)
df %>%
group_by(V5) %>%
summarize(total = sum(sum)) %>%
left_join(df, by = "V5")