I have a dataframe that looks like this:
ID | Time | Q1 | Q2 | Q3 | Q4 |
---|---|---|---|---|---|
1 | 2 min | Agree | NA | Neutral | NA |
2 | 5 min | NA | Disagree | Agree | NA |
3 | 3 min | Agree | NA | Neutral | NA |
4 | 5 min | Disagree | Disagree | NA | NA |
5 | 6 min | NA | Agree | Agree | Agree |
6 | 1 min | NA | NA | NA | NA |
I want to retain only the rows for which responses to the questions are not equivalent across Question columns (Q1:Q4). In this example, I would retain rows for IDs 1-3 and remove rows 4-6, as they are all identical strings. I do want to keep the information from the first two columns, but I don't want to use it in the decision logic regarding whether or not to keep the row. All rows have NAs, but the NAs are in various places - so I want to remove rows for which all columns that have any value in them are the same, or rows for which all values across columns are missing.
I found another answer that did something similar and tried this:
keep <- apply(df[3:6], 1, function(x) length(unique(x[!is.na(x)])) != 1)
df[keep, ]
but that seemed to only remove rows that were all NAs.
CodePudding user response:
To keep all those with discordant responses (assumed ignoring NA
s), you can try this slight change in your attempted code:
keeps <- apply(df[3:6], 1, function(x) !length(unique(x[!is.na(x)])) %in% 0:1)
df[keeps, ]
# ID Time Q1 Q2 Q3 Q4
# 1 1 2min Agree <NA> Neutral <NA>
# 2 2 5min <NA> Disagree Agree <NA>
# 3 3 3min Agree <NA> Neutral <NA>
If you want all those with identical responses (assumed ignoring NA
s):
keeps <- apply(df[3:6], 1, function(x) length(unique(x[!is.na(x)])) == 1)
df[keeps, ]
# ID Time Q1 Q2 Q3 Q4
# 4 4 5min Disagree Disagree <NA> <NA>
# 5 5 6min <NA> Agree Agree Agree
Data
df <- read.table(text = "ID Time Q1 Q2 Q3 Q4
1 2min Agree NA Neutral NA
2 5min NA Disagree Agree NA
3 3min Agree NA Neutral NA
4 5min Disagree Disagree NA NA
5 6min NA Agree Agree Agree
6 1min NA NA NA NA", header = TRUE)
CodePudding user response:
Here's a dplyr-based solution: group using rowwise()
, then filter to rows with > 1 distinct value across columns.
library(dplyr)
df %>%
rowwise() %>%
filter(n_distinct(c_across(Q1:Q4), na.rm = TRUE) > 1) %>%
ungroup()
# A tibble: 3 × 6
ID Time Q1 Q2 Q3 Q4
<int> <chr> <chr> <chr> <chr> <chr>
1 1 2min Agree NA Neutral NA
2 2 5min NA Disagree Agree NA
3 3 3min Agree NA Neutral NA
CodePudding user response:
data <- data.frame(ID = 1:6, Time = c("2 min", "5 min", "3 min",
"5 min", "6 min", "1 min"), Q1 = c("Agree", NA, "Agree", "Disagree",
NA, NA), Q2 = c(NA, "Disagree", NA, "Disagree", "Agree", NA),
Q3 = c("Neutral", "Agree", "Neutral", NA, "Agree", NA), Q4 = c(NA,
NA, NA, NA, "Agree", NA))
rows <- apply(data[3:6], 1, \(x) all(x[!is.na(x)][1] == x[!is.na(x)][-1]) )
data[!rows,]
#> ID Time Q1 Q2 Q3 Q4
#> 1 1 2 min Agree <NA> Neutral <NA>
#> 2 2 5 min <NA> Disagree Agree <NA>
#> 3 3 3 min Agree <NA> Neutral <NA>