I have a DF that looks like the following called crash_stats_TA.
TA_code | TA_name | Crashes |
---|---|---|
061 | Grey | 126 |
062 | Buller | 345 |
063 | Westland | 24 |
064 | Timaru | 112 |
I am trying to create a new column called crashes_perc using mutate. This is what I have tried so far:
library(dplyr) # data manipulation
crash_stats_TA <- crash_stats_TA %>%
group_by(TA_code, TA_name) %>%
mutate(crashes_perc = round(Crashes/sum(Crashes, na.rm = T)*100,2))
However, this returns a new colum of crashes_perc with values of 100 for each TA_code and TA_name, so every area has 100. Like so:
What would be the reason for this. Not really sure how to go about properly excuting this step
CodePudding user response:
Using scales::percent
,
crash %>%
mutate(crashes_perc = scales::percent(Crashes/sum(Crashes, na.rm = T)))
TA_code TA_name Crashes crashes_perc
<int> <chr> <int> <chr>
1 61 Grey 126 20.8%
2 62 Buller 345 56.8%
3 63 Westland 24 4.0%
4 64 Timaru 112 18.5%
Add group_by
if there is enough size of sample per groups.
CodePudding user response:
The reason for the 100% is because there is only value 'Crashes' per group and the sum
returns the same value. Instead, it should be without any grouping
library(dplyr)
crash_stats_TA %>%
mutate(crashes_perc = round(Crashes/sum(Crashes, na.rm = TRUE)*100,2))
-output
TA_code TA_name Crashes crashes_perc
1 61 Grey 126 20.76
2 62 Buller 345 56.84
3 63 Westland 24 3.95
4 64 Timaru 112 18.45
In base R
, use proportions
crash_stats_TA$crashes_perc <- with(crash_stats_TA, round(100 *
proportions(Crashes), 2))
data
crash_stats_TA <- structure(list(TA_code = 61:64, TA_name = c("Grey", "Buller",
"Westland", "Timaru"), Crashes = c(126L, 345L, 24L, 112L)),
class = "data.frame", row.names = c(NA,
-4L))
CodePudding user response:
Doing the sum by referring to the column with the dollar sign syntax is probably the simplest solution:
library(magrittr)
library(dplyr)
crash_stats_TA <- data.frame(
TA_code = c("061", "062", "063", "064"),
TA_name = c("Grey", "Buller", "Westland", "Timaru"),
Crashes = c(126, 345, 24, 112))
crash_stats_TA %<>%
mutate(crashes_perc = Crashes/sum(crash_stats_TA$Crashes, na.rm = TRUE))