Summarize for each group by counting the number of events, the total percentage and the cumulative p-CodePudding

I have the following dataset with three variables. The first variable "Adversity.category" differentiates between 4 types of adversity. The second category "Adversity.subcategory" provides more fine-grained descriptions of the adversity whereas the "Adversity.event" refers to the specific event. I would like to the provide a summary table for each adversity category. Thus, I should have one summary table for "economic crisis", one for "natural disaster", one for "anthropogenic disaster", and one for "health" with two five columns in total: subcategory, specific event, n, total % and cumulative% starting with the highest count at the top.

structure(list(Adversity.category = c("Economic crisis", "Economic crisis", 
"Natural disaster", "Anthropogenic disaster", "Economic crisis", 
"Economic crisis", "Economic crisis", "Health", "Natural disaster", 
"Economic crisis", "Natural disaster", "Economic crisis", "Economic crisis", 
"Economic crisis", "Economic crisis", "Anthropogenic disaster", 
"Anthropogenic disaster", "Natural disaster", "Economic crisis", 
"Economic crisis", "Economic crisis", "Economic crisis", "Economic crisis", 
"Persistent adversity", "Anthropogenic disaster", "Natural disaster", 
"Natural disaster", "Economic crisis", "Economic crisis", "Economic crisis"
), Adversity.subcategory = c("", "", "Biological", "Accident", 
"", "", "", "", "Biological", "", "Biological", "", "", "", "", 
"Revolution", "Terrorism", "Hydrological", "", "", "", "", "", 
"", "Terrorism", "Biological", "Geophysical", "", "", ""), Adversity.event = c("Systematic banking crisis", 
"GFC 2008", "Corona", "Deepwater Horizon Oil Spill", "GFC 2008", 
"GFC 2008", "GFC 2008", "CEO Hospitalization", "Corona", "GFC 2008", 
"Corona", "GFC 2008", "GFC 2008", "GFC 2008", "GFC 2008", "Arab Spring 2011", 
"September 11 attacks", "2010–2011 Queensland floods", "GFC 2008", 
"GFC 2008", "AFC 1997", "GFC 2008", "GFC 2008", "Economic policy uncertainty", 
"General terrorism experience", "Corona", "Tsunami", "GFC 2008", 
"GFC 2008", "Unclear economic recession")), row.names = c(NA, 
-30L), class = c("tbl_df", "tbl", "data.frame"))```

CodePudding user response：

This creates a list of your tables:

tables = df %>%
  count(across(everything())) %>%
  group_by(Adversity.category) %>%
  arrange(desc(n)) %>%
  mutate(total_pct = n / sum(n), cum_pct = cumsum(n) / sum(n)) %>%
  group_split(.keep = FALSE)

tables[[1]]
# # A tibble: 4 × 5
#   Adversity.subcategory Adversity.event                  n total_pct cum_pct
#   <chr>                 <chr>                        <int>     <dbl>   <dbl>
# 1 Accident              Deepwater Horizon Oil Spill      1      0.25    0.25
# 2 Revolution            Arab Spring 2011                 1      0.25    0.5 
# 3 Terrorism             General terrorism experience     1      0.25    0.75
# 4 Terrorism             September 11 attacks             1      0.25    1   

tables[[4]]
# # A tibble: 3 × 5
#   Adversity.subcategory Adversity.event                 n total_pct cum_pct
#   <chr>                 <chr>                       <int>     <dbl>   <dbl>
# 1 Biological            Corona                          4     0.667   0.667
# 2 Geophysical           Tsunami                         1     0.167   0.833
# 3 Hydrological          2010–2011 Queensland floods     1     0.167   1