How to summarize and spread data in R-CodePudding

I want to summarize my data to have only three columns as a result, like the following: col_1 = name of the country, col_2 = percentage of 0s, col_3 = percentage of 1s,

Here is the data:

country = rep(c("USA", "UK", "AUS", "ARM", "BEL", "BRA", "CHN", "EGY", "FIN", "FRA"),
              times = c(10, 5, 15, 10, 10, 10, 5, 15, 10, 10))
score= sample(c(0,1), replace=F)
dat = data.frame(country, score)

Thanks very much.

CodePudding user response：

Using reshape2

library(reshape2)
dat2=dcast(dat,country~score,value.var="score")
dat2[,c("0","1")]=dat2[,c("0","1")]/rowSums(dat2[,c("0","1")])

   country         0         1
1      ARM 0.5000000 0.5000000
2      AUS 0.5333333 0.4666667
3      BEL 0.5000000 0.5000000
4      BRA 0.5000000 0.5000000
5      CHN 0.4000000 0.6000000
6      EGY 0.5333333 0.4666667
7      FIN 0.5000000 0.5000000
8      FRA 0.5000000 0.5000000
9       UK 0.4000000 0.6000000
10     USA 0.5000000 0.5000000

CodePudding user response：

Another possible solution, based on tidyverse:

library(tidyverse)

country = rep(c("USA", "UK", "AUS", "ARM", "BEL", "BRA", "CHN", "EGY", "FIN", "FRA"),
              times = c(10, 5, 15, 10, 10, 10, 5, 15, 10, 10))
score= sample(c(0,1), replace=F)
dat = data.frame(country, score)

dat %>% 
  group_by(country) %>% 
  summarise(perc0s = 1-sum(score)/n(), perc1s=1-perc0s, .groups = "drop")

#> # A tibble: 10 × 3
#>    country perc0s perc1s
#>    <chr>    <dbl>  <dbl>
#>  1 ARM      0.5    0.5  
#>  2 AUS      0.467  0.533
#>  3 BEL      0.5    0.5  
#>  4 BRA      0.5    0.5  
#>  5 CHN      0.6    0.4  
#>  6 EGY      0.467  0.533
#>  7 FIN      0.5    0.5  
#>  8 FRA      0.5    0.5  
#>  9 UK       0.6    0.4  
#> 10 USA      0.5    0.5