I am new to R and I am struggling with calculating the percentage of certain observations in a data frame. My data frame is coming from an excel table with many rows and columns. Therefore, I first need to create a query of the information I need. I use the following code to do so:
am2 %>%
group_by(country) %>%
count(motif)
The output looks like this:
country motif number
------------------------------------------------
Portugal architectural elements 26
Portugal blossom 3
Portugal crescent moon 28
Portugal crosses 5
Portugal floral 3
Spain four-legged animal 14
Spain herringbone 2
Spain horseman 2
Sweden human 1
Sweden inscription 147
Sweden spiral 44
I also calculated the total number of observations per country from the original data frame like this:
am2 %>%
group_by(country) %>%
summarize("Number of Observations" = n())
This is the output:
country total number motifs
----------------------------------------
Portugal 536
Spain 2110
Sweden 300
Now I would like to calculate the frequency of each motif per country. If I would do it by hand it would be e.g. the number of the motif "architectural element" occuring in Portugal:
26/536*100
So, whenever the country changes, the total number for the motifs and thus for the percentage calculation changes. I don't know how to automate this process. Can anyone help me with this? Thanks!
CodePudding user response:
You first group by country to get the sum for each country. Then you group by country and motiv and use the sum for each country to calculate your frequency.
am2 %>%
group_by(country) %>%
mutate(sum_country = sum(number)) %>%
group_by(country, motif) %>%
mutate(freq = number/sum_country,
freq_perc = freq*100 %>% round(2))