Hi i have this dataset:
> id s1 s2 s3 s4
1 "A" "A" "NA" "A"
2 "NA" "A" "NA" "A"
3 "Na" "NA" "NA" "A"
4 "A" "NA" "NA" "Na"
5 "A" "A" "NA" "A"
I want to count how many ID's have only one value of "A" in either s1, s2, s3, s4. In this case it is only 2 persons (ID 3 and 4). But if i have a large dataset how can I count this ?
CodePudding user response:
You can use
which(rowSums(!is.na(df[-1])) == 1)
# [1] 3 4
Replace which()
with sum()
to get the number of ID that have only 1 non-missing value.
Update
If unfortunately, you store all NA
as "NA"
, "Na"
, or "na"
, then use the following code to convert them back to regular NA
in advance.
df[] <- lapply(df, \(x) { x[x %in% c('NA', 'Na', 'na')] <- NA; x })
CodePudding user response:
Checks for "A" in the string
library(tidyverse)
df %>%
filter(rowSums(across(s1:s4, ~ str_detect(.x, "A")), na.rm = TRUE) == 1)
# A tibble: 2 × 5
id s1 s2 s3 s4
<dbl> <chr> <chr> <lgl> <chr>
1 3 Na NA NA A
2 4 A NA NA Na