Home > front end >  How to add a percentage column to a dataframe that includes entier rows of NA?
How to add a percentage column to a dataframe that includes entier rows of NA?

Time:11-10

I have this dataframe

df = data.frame(name = c('a','b','c',NA,NA,'d','e',NA,NA,'f','g','h'),
                Freq = c(10,20,70,NA,NA,40,60,NA,NA,80,10,10) )

The output (I am looking for) would be something like that (I created the per column by hand in the following outcome to make my idea clear)

    name Freq  per
1     a   10   10%
2     b   20   20%
3     c   70   70%
4  <NA>   NA   NA  
5  <NA>   NA   NA
6     d   40   40%
7     e   60   60%
8  <NA>   NA   NA
9  <NA>   NA   NA
10    f   80   80%
11    g   10   10%
12    h   10   10%

so the percentage of a is a/(a b c). and the percentage of b is b/(a b c). and c is c/(a b c)

However : the percentage of d is d/(d e) and e is e/(d e)

The NAs are seperating between each reference group of percentages. I hope it's clear and thanks.

CodePudding user response:

We could create a grouping column by taking the cumulative sum on a logical vector based on the NA values, then replace the 'Freq' where the value is not NA with the proportions, ungroup and remove the temporary 'grp' column

library(dplyr)
df %>% 
  group_by(grp = cumsum(is.na(name))) %>%
  mutate(per = replace(Freq, !is.na(Freq), 
    paste0(100*proportions(Freq[!is.na(Freq)]), "%"))) %>%
  ungroup %>%
  select(-grp)

-output

# A tibble: 12 × 3
   name   Freq per  
   <chr> <dbl> <chr>
 1 a        10 10%  
 2 b        20 20%  
 3 c        70 70%  
 4 <NA>     NA <NA> 
 5 <NA>     NA <NA> 
 6 d        40 40%  
 7 e        60 60%  
 8 <NA>     NA <NA> 
 9 <NA>     NA <NA> 
10 f        80 80%  
11 g        10 10%  
12 h        10 10%  
  • Related