Is there an R function to calculate percentages?-CodePudding

I have a DF that looks like the following called crash_stats_TA.

TA_code	TA_name	Crashes
061	Grey	126
062	Buller	345
063	Westland	24
064	Timaru	112

I am trying to create a new column called crashes_perc using mutate. This is what I have tried so far:

library(dplyr) # data manipulation

crash_stats_TA <- crash_stats_TA %>%
  group_by(TA_code, TA_name) %>%
  mutate(crashes_perc = round(Crashes/sum(Crashes, na.rm = T)*100,2))

However, this returns a new colum of crashes_perc with values of 100 for each TA_code and TA_name, so every area has 100. Like so:

What would be the reason for this. Not really sure how to go about properly excuting this step

CodePudding user response：

Using scales::percent,

crash %>%
  mutate(crashes_perc = scales::percent(Crashes/sum(Crashes, na.rm = T)))

  TA_code TA_name  Crashes crashes_perc
    <int> <chr>      <int> <chr>       
1      61 Grey         126 20.8%       
2      62 Buller       345 56.8%       
3      63 Westland      24 4.0%        
4      64 Timaru       112 18.5%

Add group_by if there is enough size of sample per groups.

CodePudding user response：

The reason for the 100% is because there is only value 'Crashes' per group and the sum returns the same value. Instead, it should be without any grouping

library(dplyr)
crash_stats_TA %>% 
  mutate(crashes_perc = round(Crashes/sum(Crashes, na.rm = TRUE)*100,2))

-output

TA_code  TA_name Crashes crashes_perc
1      61     Grey     126        20.76
2      62   Buller     345        56.84
3      63 Westland      24         3.95
4      64   Timaru     112        18.45

In base R, use proportions

crash_stats_TA$crashes_perc <- with(crash_stats_TA, round(100 * 
         proportions(Crashes), 2))

data

crash_stats_TA <- structure(list(TA_code = 61:64, TA_name = c("Grey", "Buller", 
"Westland", "Timaru"), Crashes = c(126L, 345L, 24L, 112L)), 
class = "data.frame", row.names = c(NA, 
-4L))

CodePudding user response：

Doing the sum by referring to the column with the dollar sign syntax is probably the simplest solution:

library(magrittr)
library(dplyr)

crash_stats_TA <- data.frame(
  TA_code = c("061", "062", "063", "064"),
  TA_name = c("Grey", "Buller", "Westland", "Timaru"),
  Crashes = c(126, 345, 24, 112))

crash_stats_TA %<>%
  mutate(crashes_perc = Crashes/sum(crash_stats_TA$Crashes, na.rm = TRUE))