Home > Software design >  obtaining the percentage of a repeated non zero values
obtaining the percentage of a repeated non zero values

Time:03-12

my data is like this

df<-structure(list(team_3_F = c("browingal ", "browingal ", "browingal ", 
"browingal ", "browingal ", "browingal ", "browingal ", "browingal ", 
"browingal ", "browingal ", "browingal ", "browingal ", "newyorkish", 
"newyorkish", "newyorkish", "newyorkish", "site", "site", "site", 
"site", "site", "site", "team ", "team ", "team ", "team ", "team ", 
"team ", "team ", "team ", "team ", "team ", "team ", "team ", 
"team ", "team ", "team ", "team ", "team ", "team ", "team ", 
"team ", "team ", "team "), AAA_US = c(0L, 1L, 0L, 0L, 0L, 0L, 
1L, 0L, 0L, 0L, 0L, 0L, 88L, 5L, 11L, 1L, 0L, 0L, 0L, 45L, 0L, 
0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 2L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 19L), BBB_US = c(0L, 2L, 3L, 2L, 1L, 
0L, 1L, 0L, 0L, 2L, 1L, 0L, 0L, 3L, 0L, 0L, 8L, 0L, 0L, 0L, 0L, 
0L, 0L, 4L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 45L, 0L, 0L, 0L, 18L, 
0L, 0L, 0L, 1L, 0L, 0L, 0L, 19L), CCC_US = c(0L, 0L, 0L, 0L, 
0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 88L, 5L, 2L, 1L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 19L)), class = "data.frame", row.names = c(NA, 
-44L))

I want to obtain the the percentage of each combinations in regards to each category for instance

   AAA_BBB_US   AAA_CCC_US      
    2              1               12   browingal 
    2              2                4   newyorkish
    0              0                6   site
    4              2               22   team 

which means it will be the following percentage

AAA_BBB_US                     AAA_CCC_US       
    2/12*100               1/12*100           
    2/4*100                2/4*100             
    0/6*100                0/6*100              
    4/22*100               2/22*100           

so the output will be like this

AAA_BBB_US    AAA_CCC_US
16%            8.3%
50%            50%
0%             0%
18%            9%

CodePudding user response:

You can create your AAA_BBB_US, AAA_CCC_US and AAA_BBB_CCC_US columns as below (i.e. will be TRUE if the product is non-zero, then, by team sum the values, dividing by the number of rows (n()) in each group

library(dplyr)

df %>% 
  mutate(AAA_BBB_US = AAA_US*BBB_US!=0,
         AAA_CCC_US = AAA_US*CCC_US!=0,
         AAA_BBB_CCC_US = AAA_US*BBB_US*CCC_US!=0)%>% 
  group_by(team_3_F) %>%
  summarize(across(AAA_BBB_US:AAA_BBB_CCC_US, ~sum(.x)/n()))

Output:

# A tibble: 4 x 4
  team_3_F     AAA_BBB_US AAA_CCC_US AAA_BBB_CCC_US
  <chr>             <dbl>      <dbl>          <dbl>
1 "browingal "      0.167     0.0833         0.0833
2 "newyorkish"      0.25      1              0.25  
3 "site"            0         0              0     
4 "team "           0.182     0.0909         0.0909
  •  Tags:  
  • r
  • Related