I have this dataframe
df = data.frame(name = c('a','b','c',NA,NA,'d','e',NA,NA,'f','g','h'),
Freq = c(10,20,70,NA,NA,40,60,NA,NA,80,10,10) )
The output (I am looking for) would be something like that (I created the per column by hand in the following outcome to make my idea clear)
name Freq per
1 a 10 10%
2 b 20 20%
3 c 70 70%
4 <NA> NA NA
5 <NA> NA NA
6 d 40 40%
7 e 60 60%
8 <NA> NA NA
9 <NA> NA NA
10 f 80 80%
11 g 10 10%
12 h 10 10%
so the percentage of a is a/(a b c). and the percentage of b is b/(a b c). and c is c/(a b c)
However : the percentage of d is d/(d e) and e is e/(d e)
The NAs are seperating between each reference group of percentages. I hope it's clear and thanks.
CodePudding user response:
We could create a grouping column by taking the cumulative sum on a logical vector based on the NA values, then replace
the 'Freq' where the value is not NA with the proportions
, ungroup
and remove the temporary 'grp' column
library(dplyr)
df %>%
group_by(grp = cumsum(is.na(name))) %>%
mutate(per = replace(Freq, !is.na(Freq),
paste0(100*proportions(Freq[!is.na(Freq)]), "%"))) %>%
ungroup %>%
select(-grp)
-output
# A tibble: 12 × 3
name Freq per
<chr> <dbl> <chr>
1 a 10 10%
2 b 20 20%
3 c 70 70%
4 <NA> NA <NA>
5 <NA> NA <NA>
6 d 40 40%
7 e 60 60%
8 <NA> NA <NA>
9 <NA> NA <NA>
10 f 80 80%
11 g 10 10%
12 h 10 10%