Home > OS >  Calculate percentage within a subgroup in R
Calculate percentage within a subgroup in R

Time:04-09

I am new to R and I am struggling with calculating the percentage of certain observations in a data frame. My data frame is coming from an excel table with many rows and columns. Therefore, I first need to create a query of the information I need. I use the following code to do so:

am2 %>% 
  group_by(country) %>%
  count(motif) 

The output looks like this:

country     motif                        number
------------------------------------------------
Portugal    architectural elements        26        
Portugal    blossom                        3        
Portugal    crescent moon                 28        
Portugal    crosses                        5        
Portugal    floral                         3        
Spain       four-legged animal            14        
Spain       herringbone                    2        
Spain       horseman                       2        
Sweden      human                          1        
Sweden      inscription                  147
Sweden      spiral                        44

I also calculated the total number of observations per country from the original data frame like this:

am2 %>%
  group_by(country) %>%
  summarize("Number of Observations" = n())

This is the output:

country            total number motifs
----------------------------------------
Portugal                  536
Spain                     2110
Sweden                    300

Now I would like to calculate the frequency of each motif per country. If I would do it by hand it would be e.g. the number of the motif "architectural element" occuring in Portugal:

26/536*100

So, whenever the country changes, the total number for the motifs and thus for the percentage calculation changes. I don't know how to automate this process. Can anyone help me with this? Thanks!

CodePudding user response:

You first group by country to get the sum for each country. Then you group by country and motiv and use the sum for each country to calculate your frequency.

 am2 %>%
      group_by(country) %>%
      mutate(sum_country = sum(number)) %>% 
      group_by(country, motif) %>% 
      mutate(freq = number/sum_country,
             freq_perc = freq*100 %>% round(2))
  • Related