Home > database >  For loop in R, condition over several columns
For loop in R, condition over several columns

Time:10-27

I have a dataset that contains 8 columns "A1", "A2" etc, each with a value of 0-4, including NAs. These columns are repeated for different time points, so "3A1" would be 3 month, "6A1" is 6 months etc. I want to create a new column called "Status" that:

  • If all columns are NA -> "incomplete"
  • If any of the 8 columns contain NA -> "partial completion"
  • If all columns !=NA -> "fully completed"

Can someone help with the code?

October_data_UK$"Status" <- ifelse(October_data_UK$`A1`!=0 & October_data_UK$`A2`!==0 & October_data_UK$`A3`!==0 & October_data_UK$`A4`!==0 & October_data_UK$`A5`!==0 & October_data_UK$`A6`!==0 & October_data_UK$`A7`!==0 & October_data_UK$`A8`!==0 & October_data_UK$`A9`!==0 & October_data_UK$`A10`!==0, 2,
                                 ifelse(October_data_UK$`A1`==0 | October_data_UK$`A2`==0| October_data_UK$`A3`==0 | October_data_UK$`A4`==0 | October_data_UK$`A5`==0 | October_data_UK$`A6`==0 | October_data_UK$`A7`==0 | October_data_UK$`A8`==0 | October_data_UK$`A9`==0 | October_data_UK$`A10`==0, 1), 
                                 ifelse(October_data_UK$`A1`==0 & October_data_UK$`A2`==0 & October_data_UK$`A3`==0 & October_data_UK$`A4`==0 & October_data_UK$`A5`==0 & October_data_UK$`A6`==0 & October_data_UK$`A7`==0 & October_data_UK$`A8`==0 & October_data_UK$`A9`==0 & October_data_UK$`A10`==0, 0, NA))

CodePudding user response:

Here's a way with base R:

set.seed(207)
mat <- matrix(sample(c(1:4, NA), 400, replace=TRUE), ncol=10)
colnames(mat) <- paste("A", 1:10, sep="")
October_data_US <- as.data.frame(mat)
apply(October_data_US[,grep("^A\\d", names(October_data_US))], 1, function(x){
  if(any(is.na(x)))z <- "partial completion"
  if(all(is.na(x)))z <- "incomplete"
  if(all(!is.na(x)))z <- "fully complete"
  z
})
#>  [1] "fully complete"     "partial completion" "partial completion"
#>  [4] "fully complete"     "partial completion" "partial completion"
#>  [7] "partial completion" "partial completion" "partial completion"
#> [10] "partial completion" "partial completion" "partial completion"
#> [13] "partial completion" "partial completion" "partial completion"
#> [16] "partial completion" "partial completion" "fully complete"    
#> [19] "partial completion" "partial completion" "fully complete"    
#> [22] "partial completion" "partial completion" "partial completion"
#> [25] "partial completion" "partial completion" "partial completion"
#> [28] "partial completion" "partial completion" "partial completion"
#> [31] "partial completion" "fully complete"     "partial completion"
#> [34] "partial completion" "partial completion" "partial completion"
#> [37] "partial completion" "partial completion" "partial completion"
#> [40] "partial completion"

Created on 2022-10-26 by the reprex package (v2.0.1)

In the example above, the grep() function is searching for a string that starts with A (case sensitive) and has a digit following it. So every variable that matches that pattern will be included.

CodePudding user response:

library(dplyr)
df <- data.frame(id = 1:10,
  A1 = sample(c(0:4, NA_real_), 10, T),
  A2 = sample(c(0:4, NA_real_), 10, T),
  A1.2 = sample(c(0:4, NA_real_), 10, T),
  A2.2 = sample(c(0:4, NA_real_), 10, T))

df %>% rowwise() %>% mutate(status = case_when(
  all(is.na(c_across(A1:A2.2))) ~ "incomplete",
  any(is.na(c_across(A1:A2.2))) ~ "partial completion",
  !any(is.na(c_across(A1:A2.2))) ~ "fully completed"
  )
  )

# A tibble: 10 × 6
# Rowwise: 
      id    A1    A2  A1.2  A2.2 status            
   <int> <dbl> <dbl> <dbl> <dbl> <chr>             
 1     1     2    NA     0     0 partial completion
 2     2    NA     0     4     2 partial completion
 3     3     2     1     2     4 fully completed  

You could also use mutate_all if there are no other variables in your dataset.

  • Related