my data is like this
df<-structure(list(team_3_F = c("browingal ", "browingal ", "browingal ",
"browingal ", "browingal ", "browingal ", "browingal ", "browingal ",
"browingal ", "browingal ", "browingal ", "browingal ", "newyorkish",
"newyorkish", "newyorkish", "newyorkish", "site", "site", "site",
"site", "site", "site", "team ", "team ", "team ", "team ", "team ",
"team ", "team ", "team ", "team ", "team ", "team ", "team ",
"team ", "team ", "team ", "team ", "team ", "team ", "team ",
"team ", "team ", "team "), AAA_US = c(0L, 1L, 0L, 0L, 0L, 0L,
1L, 0L, 0L, 0L, 0L, 0L, 88L, 5L, 11L, 1L, 0L, 0L, 0L, 45L, 0L,
0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 2L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 19L), BBB_US = c(0L, 2L, 3L, 2L, 1L,
0L, 1L, 0L, 0L, 2L, 1L, 0L, 0L, 3L, 0L, 0L, 8L, 0L, 0L, 0L, 0L,
0L, 0L, 4L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 45L, 0L, 0L, 0L, 18L,
0L, 0L, 0L, 1L, 0L, 0L, 0L, 19L), CCC_US = c(0L, 0L, 0L, 0L,
0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 88L, 5L, 2L, 1L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 19L)), class = "data.frame", row.names = c(NA,
-44L))
I want to obtain the the percentage of each combinations in regards to each category for instance
AAA_BBB_US AAA_CCC_US
2 1 12 browingal
2 2 4 newyorkish
0 0 6 site
4 2 22 team
which means it will be the following percentage
AAA_BBB_US AAA_CCC_US
2/12*100 1/12*100
2/4*100 2/4*100
0/6*100 0/6*100
4/22*100 2/22*100
so the output will be like this
AAA_BBB_US AAA_CCC_US
16% 8.3%
50% 50%
0% 0%
18% 9%
CodePudding user response:
You can create your AAA_BBB_US, AAA_CCC_US and AAA_BBB_CCC_US columns as below (i.e. will be TRUE
if the product is non-zero, then, by team sum the values, dividing by the number of rows (n()
) in each group
library(dplyr)
df %>%
mutate(AAA_BBB_US = AAA_US*BBB_US!=0,
AAA_CCC_US = AAA_US*CCC_US!=0,
AAA_BBB_CCC_US = AAA_US*BBB_US*CCC_US!=0)%>%
group_by(team_3_F) %>%
summarize(across(AAA_BBB_US:AAA_BBB_CCC_US, ~sum(.x)/n()))
Output:
# A tibble: 4 x 4
team_3_F AAA_BBB_US AAA_CCC_US AAA_BBB_CCC_US
<chr> <dbl> <dbl> <dbl>
1 "browingal " 0.167 0.0833 0.0833
2 "newyorkish" 0.25 1 0.25
3 "site" 0 0 0
4 "team " 0.182 0.0909 0.0909