I want to summarize my data to have only three columns as a result, like the following:
col_1 = name of the country,
col_2 = percentage of 0s,
col_3 = percentage of 1s,
Here is the data:
country = rep(c("USA", "UK", "AUS", "ARM", "BEL", "BRA", "CHN", "EGY", "FIN", "FRA"),
times = c(10, 5, 15, 10, 10, 10, 5, 15, 10, 10))
score= sample(c(0,1), replace=F)
dat = data.frame(country, score)
Thanks very much.
CodePudding user response:
Using reshape2
library(reshape2)
dat2=dcast(dat,country~score,value.var="score")
dat2[,c("0","1")]=dat2[,c("0","1")]/rowSums(dat2[,c("0","1")])
country 0 1
1 ARM 0.5000000 0.5000000
2 AUS 0.5333333 0.4666667
3 BEL 0.5000000 0.5000000
4 BRA 0.5000000 0.5000000
5 CHN 0.4000000 0.6000000
6 EGY 0.5333333 0.4666667
7 FIN 0.5000000 0.5000000
8 FRA 0.5000000 0.5000000
9 UK 0.4000000 0.6000000
10 USA 0.5000000 0.5000000
CodePudding user response:
Another possible solution, based on tidyverse
:
library(tidyverse)
country = rep(c("USA", "UK", "AUS", "ARM", "BEL", "BRA", "CHN", "EGY", "FIN", "FRA"),
times = c(10, 5, 15, 10, 10, 10, 5, 15, 10, 10))
score= sample(c(0,1), replace=F)
dat = data.frame(country, score)
dat %>%
group_by(country) %>%
summarise(perc0s = 1-sum(score)/n(), perc1s=1-perc0s, .groups = "drop")
#> # A tibble: 10 × 3
#> country perc0s perc1s
#> <chr> <dbl> <dbl>
#> 1 ARM 0.5 0.5
#> 2 AUS 0.467 0.533
#> 3 BEL 0.5 0.5
#> 4 BRA 0.5 0.5
#> 5 CHN 0.6 0.4
#> 6 EGY 0.467 0.533
#> 7 FIN 0.5 0.5
#> 8 FRA 0.5 0.5
#> 9 UK 0.6 0.4
#> 10 USA 0.5 0.5