How to sum ID within a DF by area in R?-CodePudding

I have a dataframe of crash statistics called crashes_TA. The datafame looks like the following but on a much larger scale with each row representing a crash:

the dataframe is called crashes_TA

TA_name	TA_code	fatal_count	serious_injury_count	minor_injury_count	ID
Grey	061	2	0	1	1
Buller	062	1	1	1	2
Grey	061	1	1	1	3
Clutha	063	0	1	1	4
Clutha	063	1	1	2	5
Otago	064	1	1	0	6

I would like to summarise fatal, serious, and minor by TA_name by creating a new collum called casualties. I would also like to summarise ID which represents the number of crashes per region as this value would be different to casualties as not all crashes have casualties. This new collum would be called crashes

My new dataframe would then look like this:

TA_name	TA_code	fatal_count	serious_injury_count	minor_injury_count	casualties	crashes
Grey	061	3	1	2	6	2
Buller	062	1	1	1	3	1
Clutha	063	1	2	3	6	2
Otago	064	1	1	0	2	1

This is my code I have tried so far

crashes_stats_TA <- crashes_TA %>% 
  group_by(TA_code, TA_name) %>%
  summarise(across(contains("count"), ~sum(., na.rm = T)),
            across(Population, ~mean(., na.rm = T),
            across(contains("perc"), ~mean(., na.rm = T), .names = "{.col}_mean"))) %>%
  mutate(casualties = round(fatal_count   serious_injury_count   minor_injury_count), 
         crashes = round(ID = sum(ID, na.rm = T)))

However, when I do this I get this error:

Error: Problem with `mutate()` column `Crashes`.
i `Crashes = round(ID = sum(ID, na.rm = T))`.
x object 'ID' not found

CodePudding user response：

You may use -

library(dplyr)

df %>%
  group_by(TA_name, TA_code) %>%
  summarise(across(fatal_count:minor_injury_count, sum, na.rm = TRUE),
            crashes = n(), .groups = 'drop') %>%
  mutate(casualties = rowSums(select(., fatal_count:minor_injury_count)))

#  TA_name TA_code fatal_count serious_injury_count minor_injury_count crashes casualties
#  <chr>     <int>       <int>                <int>              <int>   <int>      <dbl>
#1 Buller       62           1                    1                  1       1          3
#2 Clutha       63           1                    2                  3       2          6
#3 Grey         61           3                    1                  2       2          6
#4 Otago        64           1                    1                  0       1          2

data

It is easier to help if you provide data in a reproducible format

df <- structure(list(TA_name = c("Grey", "Buller", "Grey", "Clutha", 
"Clutha", "Otago"), TA_code = c(61L, 62L, 61L, 63L, 63L, 64L), 
    fatal_count = c(2L, 1L, 1L, 0L, 1L, 1L), serious_injury_count = c(0L, 
    1L, 1L, 1L, 1L, 1L), minor_injury_count = c(1L, 1L, 1L, 1L, 
    2L, 0L), ID = 1:6), row.names = c(NA, -6L), class = "data.frame")

CodePudding user response：

We could do it this way:

library(dplyr)

df %>% 
  group_by(TA_name, TA_code) %>%
  add_count(name="crashes") %>% 
  summarise(across(contains("count"), sum),
            causalities = sum(fatal_count, serious_injury_count, minor_injury_count),
            crashes= unique(crashes))

  TA_name TA_code fatal_count serious_injury_count minor_injury_count causalities crashes
  <chr>     <int>       <int>                <int>              <int>       <int>   <int>
1 Buller       62           1                    1                  1           3       1
2 Clutha       63           1                    2                  3           6       2
3 Grey         61           3                    1                  2           6       2
4 Otago        64           1                    1                  0           2       1