Home > Back-end >  R group_by summarise omit data when in-group filter not working
R group_by summarise omit data when in-group filter not working

Time:06-04

I have a data frame like this, for which not all Inst have the Count for all Group.

test=data_frame(Inst=c('A','A','A','B','B','B','C','C'),
        Var=c('Count','Count','Total','Count','Count','Total','Count','Total'),
        Group=c('F','M','T','F','M','T','F','T'),
        Count=c(1,2,3,3,4,7,2,4)
        )

# A tibble: 8 × 4
  Inst  Var   Group Count
  <chr> <chr> <chr> <dbl>
1 A     Count F         1
2 A     Count M         2
3 A     Total T         3
4 B     Count F         3
5 B     Count M         4
6 B     Total T         7
7 C     Count F         2
8 C     Total T         4

I want to calculate the Count of each Group (F,M) divided by Count of T

> test %>% group_by(Inst) %>% summarise(F_R=Count[Group=='F']/Count[Group=='T'],
                                        M_R=Count[Group=='M']/Count[Group=='T'],
                                        )
`summarise()` has grouped output by 'Inst'. You can override using the `.groups` argument.
# A tibble: 2 × 3
# Groups:   Inst [2]
  Inst    F_R   M_R
  <chr> <dbl> <dbl>
1 A     0.333 0.667
2 B     0.429 0.571

As you can see C was omitted because it doesn't have M's Count. But I would still want to have its F_R and have M_R as NA. How could I do this to have a table like below?

  Inst    F_R   M_R
  <chr> <dbl> <dbl>
1 A     0.333 0.667
2 B     0.429 0.571
3 C     0.5     NA

CodePudding user response:

You could do:

library(dplyr)
library(tidyr)

test %>% 
  group_by(Inst) %>% 
  summarise(Group, R = Count / Count[Group == "T"], .groups = "drop") |>
  filter(!Group == "T") |> 
 pivot_wider(names_from = Group, values_from = R, names_glue = "{Group}_{.value}")
#> # A tibble: 3 × 3
#>   Inst    F_R    M_R
#>   <chr> <dbl>  <dbl>
#> 1 A     0.333  0.667
#> 2 B     0.429  0.571
#> 3 C     0.5   NA

CodePudding user response:

How about this:

library(tidyverse)
test=data_frame(Inst=c('A','A','A','B','B','B','C','C'),
                Var=c('Count','Count','Total','Count','Count','Total','Count','Total'),
                Group=c('F','M','T','F','M','T','F','T'),
                Count=c(1,2,3,3,4,7,2,4))

test %>%   
  complete(Inst, Group, fill = list(Var="Count")) %>% 
  group_by(Inst) %>% 
  summarise(F_R=Count[Group=='F']/Count[Group=='T'],
            M_R=Count[Group=='M']/Count[Group=='T'])
#> # A tibble: 3 × 3
#>   Inst    F_R    M_R
#>   <chr> <dbl>  <dbl>
#> 1 A     0.333  0.667
#> 2 B     0.429  0.571
#> 3 C     0.5   NA

Created on 2022-06-03 by the reprex package (v2.0.1)

  • Related