How to sort these data based on some criteria in R-CodePudding

Here is a sample of my data :

 M<-read.table (text=" group    value   blue
    B   12  Y
    C   14  Y
    A   12  Y
    B   12  N
    C   10  Y
    A   7   Y
    B   6   Y

", header=TRUE)

I want to have a sum for each group based on the value group_by(group) or using aggregate. Next will look at blue; if it is a "Y", then I sum them based on each group. For example, A, both are Y, so A=19. Now I can calculate p, i.e., 19/19*100. Here is the outcome that I got it.

 group  value   P
    A   19  100
    B   30  60
    C   24  100

CodePudding user response：

You could do:

library(tidyverse)

M %>%
  group_by(group) %>%
  summarize(P = 100 * sum(value[blue == "Y"])/sum(value),
            value = sum(value)) %>%
  select(1, 3, 2)
#> # A tibble: 3 x 3
#>   group value     P
#>   <chr> <int> <dbl>
#> 1 A        19   100
#> 2 B        30    60
#> 3 C        24   100

^{Created on 2023-01-01 with reprex v2.0.2}

CodePudding user response：

A dplyr solution:

library(dplyr)

M %>%
  count(group, blue, wt = value) %>%
  group_by(group) %>%
  summarise(N = sum(n), P = n[blue == 'Y'] / N * 100)

# A tibble: 3 × 3
  group     N     P
  <chr> <int> <dbl>
1 A        19   100
2 B        30    60
3 C        24   100

CodePudding user response：

'data.table' solution, assuming there are no NA's in value. If not so, add na.rm = TRUE to the sum-functions

library(data.table)
setDT(M)[, .(value = sum(value), P = 100 * sum(value[blue == "Y"]) / sum(value) ), keyby = .(group)]
#    group value   P
# 1:     A    19 100
# 2:     B    30  60
# 3:     C    24 100