Home > Software engineering >  Mutate in dplyr the proportions of TRUE or FALSE in 2 new columns
Mutate in dplyr the proportions of TRUE or FALSE in 2 new columns

Time:02-24

I have the following table in R:

CAT CONDITION
A TRUE
A TRUE
A FALSE
A TRUE
B TRUE
B TRUE
B TRUE
B FALSE
cat = c(rep("A",4),rep("B",4));cat
cond = c("TRUE","TRUE","FALSE","TRUE","FALSE","TRUE","TRUE","FALSE")
data3 = cbind(cat,cond);data3

and i want to reduce it in dplyr giving me the (mutating two new columns) with the percentages of TRUE in the first new column and the percentages of FALSE in the second new column.Like this :

CAT TRUE FALSE
A 0.5 0.5
B 0.75 0.25

CodePudding user response:

Like this?

library(tidyverse)
cat = c(rep("A",4),rep("B",4))
cond = c("TRUE","TRUE","FALSE","TRUE","FALSE","TRUE","TRUE","FALSE")
data3 = data.frame(cat,cond)

data3 %>%
  group_by(cat) %>%
  summarise("TRUE" = sum(cond == TRUE) / n(),
            "FALSE" = sum(cond == FALSE) / n())
#> # A tibble: 2 × 3
#>   cat   `TRUE` `FALSE`
#>   <chr>  <dbl>   <dbl>
#> 1 A       0.75    0.25
#> 2 B       0.5     0.5

Created on 2022-02-24 by the reprex package (v2.0.1)

CodePudding user response:

You could also simply use mean with logical variables:

library(dplyr)
data3 %>% 
        as_tibble() %>% # this converts your matrix into a tibble
        mutate(cond = as.logical(cond)) %>% # convert character to logical
        group_by(cat) %>% 
        summarise("TRUE" = mean(cond),
                  "FALSE" = mean(!cond))

Output:

# A tibble: 2 x 3
  cat   `TRUE` `FALSE`
  <chr>  <dbl>   <dbl>
1 A       0.75    0.25
2 B       0.5     0.5 

CodePudding user response:

Outside dplyr, this is made really easy with prop.table and table:

with(as.data.frame(data3), prop.table(table(cat, cond), 1))

   cond
cat FALSE TRUE
  A  0.25 0.75
  B  0.50 0.50

Or, even simpler (credits to @G. Grothendieck), using xtabs:

prop.table(xtabs(~., data3), 1)

CodePudding user response:

Here is another way: If you want to get the proportions of TRUE FALSE for each CAT then this should work:

library(dplyr)
library(tidyr)

df %>% 
  group_by(CAT, CONDITION) %>% 
  tally() %>% 
  mutate(n = (n/sum(n))) %>% 
  pivot_wider(
    id_cols = CAT,
    names_from = CONDITION,
    values_from = n
  )
  CAT   `FALSE` `TRUE`
  <chr>   <dbl>  <dbl>
1 A        0.25   0.75
2 B        0.25   0.75
  • Related