I have a dataset that contains 8 columns "A1", "A2" etc, each with a value of 0-4, including NAs. These columns are repeated for different time points, so "3A1" would be 3 month, "6A1" is 6 months etc. I want to create a new column called "Status" that:
- If all columns are NA -> "incomplete"
- If any of the 8 columns contain NA -> "partial completion"
- If all columns !=NA -> "fully completed"
Can someone help with the code?
October_data_UK$"Status" <- ifelse(October_data_UK$`A1`!=0 & October_data_UK$`A2`!==0 & October_data_UK$`A3`!==0 & October_data_UK$`A4`!==0 & October_data_UK$`A5`!==0 & October_data_UK$`A6`!==0 & October_data_UK$`A7`!==0 & October_data_UK$`A8`!==0 & October_data_UK$`A9`!==0 & October_data_UK$`A10`!==0, 2,
ifelse(October_data_UK$`A1`==0 | October_data_UK$`A2`==0| October_data_UK$`A3`==0 | October_data_UK$`A4`==0 | October_data_UK$`A5`==0 | October_data_UK$`A6`==0 | October_data_UK$`A7`==0 | October_data_UK$`A8`==0 | October_data_UK$`A9`==0 | October_data_UK$`A10`==0, 1),
ifelse(October_data_UK$`A1`==0 & October_data_UK$`A2`==0 & October_data_UK$`A3`==0 & October_data_UK$`A4`==0 & October_data_UK$`A5`==0 & October_data_UK$`A6`==0 & October_data_UK$`A7`==0 & October_data_UK$`A8`==0 & October_data_UK$`A9`==0 & October_data_UK$`A10`==0, 0, NA))
CodePudding user response:
Here's a way with base R:
set.seed(207)
mat <- matrix(sample(c(1:4, NA), 400, replace=TRUE), ncol=10)
colnames(mat) <- paste("A", 1:10, sep="")
October_data_US <- as.data.frame(mat)
apply(October_data_US[,grep("^A\\d", names(October_data_US))], 1, function(x){
if(any(is.na(x)))z <- "partial completion"
if(all(is.na(x)))z <- "incomplete"
if(all(!is.na(x)))z <- "fully complete"
z
})
#> [1] "fully complete" "partial completion" "partial completion"
#> [4] "fully complete" "partial completion" "partial completion"
#> [7] "partial completion" "partial completion" "partial completion"
#> [10] "partial completion" "partial completion" "partial completion"
#> [13] "partial completion" "partial completion" "partial completion"
#> [16] "partial completion" "partial completion" "fully complete"
#> [19] "partial completion" "partial completion" "fully complete"
#> [22] "partial completion" "partial completion" "partial completion"
#> [25] "partial completion" "partial completion" "partial completion"
#> [28] "partial completion" "partial completion" "partial completion"
#> [31] "partial completion" "fully complete" "partial completion"
#> [34] "partial completion" "partial completion" "partial completion"
#> [37] "partial completion" "partial completion" "partial completion"
#> [40] "partial completion"
Created on 2022-10-26 by the reprex package (v2.0.1)
In the example above, the grep()
function is searching for a string that starts with A (case sensitive) and has a digit following it. So every variable that matches that pattern will be included.
CodePudding user response:
library(dplyr)
df <- data.frame(id = 1:10,
A1 = sample(c(0:4, NA_real_), 10, T),
A2 = sample(c(0:4, NA_real_), 10, T),
A1.2 = sample(c(0:4, NA_real_), 10, T),
A2.2 = sample(c(0:4, NA_real_), 10, T))
df %>% rowwise() %>% mutate(status = case_when(
all(is.na(c_across(A1:A2.2))) ~ "incomplete",
any(is.na(c_across(A1:A2.2))) ~ "partial completion",
!any(is.na(c_across(A1:A2.2))) ~ "fully completed"
)
)
# A tibble: 10 × 6
# Rowwise:
id A1 A2 A1.2 A2.2 status
<int> <dbl> <dbl> <dbl> <dbl> <chr>
1 1 2 NA 0 0 partial completion
2 2 NA 0 4 2 partial completion
3 3 2 1 2 4 fully completed
You could also use mutate_all if there are no other variables in your dataset.