Home > Software design >  creating a column that based on another column in dataframe in R
creating a column that based on another column in dataframe in R

Time:12-10

In my data, I've some studyies reporting both subscale and composite.

I want to add a new column called include. For studyies reporting both subscale and composite, in the rows that are subscale, include should be TRUE else it must be FALSE; any other row must be TRUE.

In other words, include can only be FALSE for reporting==composite ONLY in studyies that have reported both subscale and composite. Everywhere else include must be TRUE.

My desired output is below. Is this achievable in R?

library(tidyverse)
m="
study  reporting
1      subscale
1      composite
2      subscale
2      composite
3      composite
3      composite
4      composite
5      subscale"

data <- read.table(text = m, h=T)

desired =
"study  reporting  include
 1       subscale    TRUE
 1      composite   FALSE
 2       subscale    TRUE
 2      composite   FALSE
 3      composite    TRUE
 3      composite    TRUE
 4      composite    TRUE
 5      subscale     TRUE"

CodePudding user response:

library(dplyr)
data %>%
  group_by(study) %>%
  mutate(
    include = !(
      "subscale" %in% reporting & 
      "composite" %in% reporting &
      reporting == "composite"
  ))
# # A tibble: 8 × 3
# # Groups:   study [5]
# study reporting include
# <int> <chr>     <lgl>  
# 1     1 subscale  TRUE   
# 2     1 composite FALSE  
# 3     2 subscale  TRUE   
# 4     2 composite FALSE  
# 5     3 composite TRUE   
# 6     3 composite TRUE   
# 7     4 composite TRUE   
# 8     5 subscale  TRUE  
  • Related