I would like to compare the values of pairs of columns whose names start with certain characters but have same suffix in their column name. In the example below, I want to compare q_1 with v_1 and q_2 with v_2 and I want for each pair compared a new column that indicates if the compared columns are equal, i.e. equal_1 and equal_2:
q_1 v_1 q_2 v_2 equal_1 equal_2
0 1 1 0 NO NO
1 1 0 0 YES YES
(above sample data is simplified, in the original dataframe the suffixes go up to 200 and there are a lot of other variables that are suffixed (like i_1 … i_100), so the solution has be specific to the variables wanted.)
My code so far doesn't return the expected results, any hint much appreciated! An answer should use a tidyverse approach.
df <- data.frame(
ID = c(1, 2),
q_1 = c(0,1),
v_1 = c(1,1),
q_2 = c(1,0),
v_2 = c(0,0)
)
df2 <- df %>%
mutate(across(starts_with('q'), ~if_else(.x == sub("q", "v", .x), 'YES', 'NO'), .names = '{sub("q", "equal", .col)}'))
print(df2, quote = FALSE, row.names = FALSE)
CodePudding user response:
I would first pivot the data, giving you just two columns q
and v
, labeled row-wise by pair
. Then it's trivial to compare the two columns.
library(tidyverse)
df_pivoted <- df %>%
pivot_longer(
!ID,
names_to = c(".value", "pair"),
names_sep = "_"
) %>%
mutate(equal = if_else(q == v, "YES", "NO"))
# # A tibble: 4 x 5
# ID pair q v equal
# <dbl> <chr> <dbl> <dbl> <chr>
# 1 1 1 0 1 NO
# 2 1 2 1 0 NO
# 3 2 1 1 1 YES
# 4 2 2 0 0 YES
Whatever you're doing downstream may also be easier in long format; but you can also pivot back to wide:
df2 <- df_pivoted %>%
pivot_wider(names_from = pair, values_from = q:equal)
# # A tibble: 2 x 7
# ID q_1 q_2 v_1 v_2 equal_1 equal_2
# <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
# 1 1 0 1 1 0 NO NO
# 2 2 1 0 1 0 YES YES
CodePudding user response:
I think it would be easier to pivot to a longer format first and then pivot back to the desired output:
library(tidyr)
library(dplyr)
df %>%
pivot_longer(matches("[a-z]_\\d"),
names_sep = "_",
names_to = c(".value", "pair")) %>%
rowwise() %>%
mutate(equal = n_distinct(c_across(q:v)) == 1) %>%
pivot_wider(names_from = pair,
values_from = q:last_col())
You might need to change some of the tidyselect options used for your more complex data.
Output
ID q_1 q_2 v_1 v_2 equal_1 equal_2
<dbl> <dbl> <dbl> <dbl> <dbl> <lgl> <lgl>
1 1 0 1 1 0 FALSE FALSE
2 2 1 0 1 0 TRUE TRUE
CodePudding user response:
How about a for loop? for the suffixe numbers :
for(i in unique(gsub("[^0-9]", "", names(df)[-1]))){
df <- df %>%
mutate(across(ends_with(i)[1],
~if_else(df[,paste0("q_", i)] == df[,paste0("v_", i)], "YES", "NO"),
.names = '{sub("._", "equal_", .col)}'))
}
> df
# ID q_1 v_1 q_2 v_2 equal_1 equal_2
# 1 1 0 1 1 0 NO NO
# 2 2 1 1 0 0 YES YES
Or in base R, bc it really writes itself better like that :
for(i in unique(strtoi(gsub("[^0-9]","",names(df))))[-1]){
df[,paste0("equal_", i)] <- ifelse(
df[,paste0("q_", i)] == df[,paste0("v_", i)],
"YES", "NO"
)
}
> df
# ID q_1 v_1 q_2 v_2 equal_1 equal_2
# 1 1 0 1 1 0 NO NO
# 2 2 1 1 0 0 YES YES