I have a dataframe of crash statistics called crashes_TA. The datafame looks like the following but on a much larger scale with each row representing a crash:
the dataframe is called crashes_TA
TA_name | TA_code | fatal_count | serious_injury_count | minor_injury_count | ID |
---|---|---|---|---|---|
Grey | 061 | 2 | 0 | 1 | 1 |
Buller | 062 | 1 | 1 | 1 | 2 |
Grey | 061 | 1 | 1 | 1 | 3 |
Clutha | 063 | 0 | 1 | 1 | 4 |
Clutha | 063 | 1 | 1 | 2 | 5 |
Otago | 064 | 1 | 1 | 0 | 6 |
I would like to summarise fatal, serious, and minor by TA_name by creating a new collum called casualties. I would also like to summarise ID which represents the number of crashes per region as this value would be different to casualties as not all crashes have casualties. This new collum would be called crashes
My new dataframe would then look like this:
TA_name | TA_code | fatal_count | serious_injury_count | minor_injury_count | casualties | crashes |
---|---|---|---|---|---|---|
Grey | 061 | 3 | 1 | 2 | 6 | 2 |
Buller | 062 | 1 | 1 | 1 | 3 | 1 |
Clutha | 063 | 1 | 2 | 3 | 6 | 2 |
Otago | 064 | 1 | 1 | 0 | 2 | 1 |
This is my code I have tried so far
crashes_stats_TA <- crashes_TA %>%
group_by(TA_code, TA_name) %>%
summarise(across(contains("count"), ~sum(., na.rm = T)),
across(Population, ~mean(., na.rm = T),
across(contains("perc"), ~mean(., na.rm = T), .names = "{.col}_mean"))) %>%
mutate(casualties = round(fatal_count serious_injury_count minor_injury_count),
crashes = round(ID = sum(ID, na.rm = T)))
However, when I do this I get this error:
Error: Problem with `mutate()` column `Crashes`.
i `Crashes = round(ID = sum(ID, na.rm = T))`.
x object 'ID' not found
CodePudding user response:
You may use -
library(dplyr)
df %>%
group_by(TA_name, TA_code) %>%
summarise(across(fatal_count:minor_injury_count, sum, na.rm = TRUE),
crashes = n(), .groups = 'drop') %>%
mutate(casualties = rowSums(select(., fatal_count:minor_injury_count)))
# TA_name TA_code fatal_count serious_injury_count minor_injury_count crashes casualties
# <chr> <int> <int> <int> <int> <int> <dbl>
#1 Buller 62 1 1 1 1 3
#2 Clutha 63 1 2 3 2 6
#3 Grey 61 3 1 2 2 6
#4 Otago 64 1 1 0 1 2
data
It is easier to help if you provide data in a reproducible format
df <- structure(list(TA_name = c("Grey", "Buller", "Grey", "Clutha",
"Clutha", "Otago"), TA_code = c(61L, 62L, 61L, 63L, 63L, 64L),
fatal_count = c(2L, 1L, 1L, 0L, 1L, 1L), serious_injury_count = c(0L,
1L, 1L, 1L, 1L, 1L), minor_injury_count = c(1L, 1L, 1L, 1L,
2L, 0L), ID = 1:6), row.names = c(NA, -6L), class = "data.frame")
CodePudding user response:
We could do it this way:
library(dplyr)
df %>%
group_by(TA_name, TA_code) %>%
add_count(name="crashes") %>%
summarise(across(contains("count"), sum),
causalities = sum(fatal_count, serious_injury_count, minor_injury_count),
crashes= unique(crashes))
TA_name TA_code fatal_count serious_injury_count minor_injury_count causalities crashes
<chr> <int> <int> <int> <int> <int> <int>
1 Buller 62 1 1 1 3 1
2 Clutha 63 1 2 3 6 2
3 Grey 61 3 1 2 6 2
4 Otago 64 1 1 0 2 1