Home > OS >  Group and divide multiple values
Group and divide multiple values

Time:11-11

I want to group my dataset by a few variables and then take the sum of the numeric variable. Then divide the individual values by this sum to get a proportion, and mutate this as a column.

For example, say I have a dataset like this:

year         disastertype area(km^2)     country
2001           earthquake   1907.098 Afghanistan
2001           earthquake   3635.378 Afghanistan
2001           earthquake   5889.177 Afghanistan
2001 extreme temperature    8042.396 Afghanistan
2001 extreme temperature   11263.485 Afghanistan
2001 extreme temperature   11802.311 Afghanistan

I can get the the sum of area relative to disaster type and country using

test_two <- test_one %>%group_by(disastertype, country,`area(km^2)`, year) %>% count %>% aggregate(. ~ disastertype   country   year,data=., sum)

But when I try to divide area by this sum using:

data_test$`area(km^2)` %>%  map_dbl(~ .x/data_test2$`area(km^2)`)

Error: Result 1 must be a single double, not a double vector of length 2

Expected result:

    year         disastertype area(km^2)     country  proportion
1   2001           earthquake   1907.098 Afghanistan  0.1668261   
10  2001           earthquake   3635.378 Afghanistan  0.3180099
65  2001           earthquake   5889.177 Afghanistan  0.5151642
109 2001 extreme temperature    8042.396 Afghanistan  0.2585299
135 2001 extreme temperature   11263.485 Afghanistan  0.3620746
146 2001 extreme temperature   11802.311 Afghanistan  0.3793956

Reproducible code:

structure(list(year = c(2001, 2001, 2001, 2001, 2001, 2001), 
    disastertype = c("earthquake", "earthquake", "earthquake", 
    "extreme temperature ", "extreme temperature ", "extreme temperature "
    ), `area(km^2)` = c(1907.09808242381, 3635.37825411105, 5889.17746880181, 
    8042.39623016696, 11263.4848508564, 11802.3111500339), country = c("Afghanistan", 
    "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", 
    "Afghanistan")), row.names = c(1L, 10L, 65L, 109L, 135L, 
146L), class = "data.frame")

CodePudding user response:

You shoudn't group by area(km^2):

df %>%
  group_by(year, country, disastertype) %>%
  mutate(proportion = `area(km^2)` / sum(`area(km^2)`)) %>%
  ungroup()
  • Related