Home > Software engineering >  How to sum ID within a DF by area in R?
How to sum ID within a DF by area in R?

Time:10-10

I have a dataframe of crash statistics called crashes_TA. The datafame looks like the following but on a much larger scale with each row representing a crash:

the dataframe is called crashes_TA

TA_name TA_code fatal_count serious_injury_count minor_injury_count ID
Grey 061 2 0 1 1
Buller 062 1 1 1 2
Grey 061 1 1 1 3
Clutha 063 0 1 1 4
Clutha 063 1 1 2 5
Otago 064 1 1 0 6

I would like to summarise fatal, serious, and minor by TA_name by creating a new collum called casualties. I would also like to summarise ID which represents the number of crashes per region as this value would be different to casualties as not all crashes have casualties. This new collum would be called crashes

My new dataframe would then look like this:

TA_name TA_code fatal_count serious_injury_count minor_injury_count casualties crashes
Grey 061 3 1 2 6 2
Buller 062 1 1 1 3 1
Clutha 063 1 2 3 6 2
Otago 064 1 1 0 2 1

This is my code I have tried so far

crashes_stats_TA <- crashes_TA %>% 
  group_by(TA_code, TA_name) %>%
  summarise(across(contains("count"), ~sum(., na.rm = T)),
            across(Population, ~mean(., na.rm = T),
            across(contains("perc"), ~mean(., na.rm = T), .names = "{.col}_mean"))) %>%
  mutate(casualties = round(fatal_count   serious_injury_count   minor_injury_count), 
         crashes = round(ID = sum(ID, na.rm = T)))

However, when I do this I get this error:

Error: Problem with `mutate()` column `Crashes`.
i `Crashes = round(ID = sum(ID, na.rm = T))`.
x object 'ID' not found

dataframe

CodePudding user response:

You may use -

library(dplyr)

df %>%
  group_by(TA_name, TA_code) %>%
  summarise(across(fatal_count:minor_injury_count, sum, na.rm = TRUE),
            crashes = n(), .groups = 'drop') %>%
  mutate(casualties = rowSums(select(., fatal_count:minor_injury_count)))

#  TA_name TA_code fatal_count serious_injury_count minor_injury_count crashes casualties
#  <chr>     <int>       <int>                <int>              <int>   <int>      <dbl>
#1 Buller       62           1                    1                  1       1          3
#2 Clutha       63           1                    2                  3       2          6
#3 Grey         61           3                    1                  2       2          6
#4 Otago        64           1                    1                  0       1          2

data

It is easier to help if you provide data in a reproducible format

df <- structure(list(TA_name = c("Grey", "Buller", "Grey", "Clutha", 
"Clutha", "Otago"), TA_code = c(61L, 62L, 61L, 63L, 63L, 64L), 
    fatal_count = c(2L, 1L, 1L, 0L, 1L, 1L), serious_injury_count = c(0L, 
    1L, 1L, 1L, 1L, 1L), minor_injury_count = c(1L, 1L, 1L, 1L, 
    2L, 0L), ID = 1:6), row.names = c(NA, -6L), class = "data.frame")

CodePudding user response:

We could do it this way:

library(dplyr)

df %>% 
  group_by(TA_name, TA_code) %>%
  add_count(name="crashes") %>% 
  summarise(across(contains("count"), sum),
            causalities = sum(fatal_count, serious_injury_count, minor_injury_count),
            crashes= unique(crashes))
  TA_name TA_code fatal_count serious_injury_count minor_injury_count causalities crashes
  <chr>     <int>       <int>                <int>              <int>       <int>   <int>
1 Buller       62           1                    1                  1           3       1
2 Clutha       63           1                    2                  3           6       2
3 Grey         61           3                    1                  2           6       2
4 Otago        64           1                    1                  0           2       1
  • Related