Comparing values of pairs of columns whose names are suffixed by numbers (R / tidyverse)-CodePudding

I would like to compare the values of pairs of columns whose names start with certain characters but have same suffix in their column name. In the example below, I want to compare q_1 with v_1 and q_2 with v_2 and I want for each pair compared a new column that indicates if the compared columns are equal, i.e. equal_1 and equal_2:

  q_1  v_1  q_2  v_2  equal_1  equal_2
    0    1    1    0       NO       NO
    1    1    0    0      YES      YES

(above sample data is simplified, in the original dataframe the suffixes go up to 200 and there are a lot of other variables that are suffixed (like i_1 … i_100), so the solution has be specific to the variables wanted.)

My code so far doesn't return the expected results, any hint much appreciated! An answer should use a tidyverse approach.

df <- data.frame(
  ID = c(1, 2),
  q_1 = c(0,1),
  v_1 = c(1,1),
  q_2 = c(1,0),
  v_2 = c(0,0)
)

df2 <- df %>% 
  mutate(across(starts_with('q'), ~if_else(.x == sub("q", "v", .x), 'YES', 'NO'), .names = '{sub("q", "equal", .col)}'))
print(df2, quote = FALSE, row.names = FALSE)

CodePudding user response：

I would first pivot the data, giving you just two columns q and v, labeled row-wise by pair. Then it's trivial to compare the two columns.

library(tidyverse)

df_pivoted <- df %>%
  pivot_longer(
    !ID,
    names_to = c(".value", "pair"),
    names_sep = "_"
  ) %>% 
  mutate(equal = if_else(q == v, "YES", "NO"))

# # A tibble: 4 x 5
#      ID pair      q     v equal
#   <dbl> <chr> <dbl> <dbl> <chr>
# 1     1 1         0     1 NO   
# 2     1 2         1     0 NO   
# 3     2 1         1     1 YES  
# 4     2 2         0     0 YES

Whatever you're doing downstream may also be easier in long format; but you can also pivot back to wide:

df2 <- df_pivoted %>% 
  pivot_wider(names_from = pair, values_from = q:equal)

# # A tibble: 2 x 7
#      ID   q_1   q_2   v_1   v_2 equal_1 equal_2
#   <dbl> <dbl> <dbl> <dbl> <dbl> <chr>   <chr>  
# 1     1     0     1     1     0 NO      NO     
# 2     2     1     0     1     0 YES     YES

CodePudding user response：

I think it would be easier to pivot to a longer format first and then pivot back to the desired output:

library(tidyr)
library(dplyr)

df %>% 
  pivot_longer(matches("[a-z]_\\d"),
               names_sep = "_",
               names_to = c(".value", "pair")) %>% 
  rowwise() %>% 
  mutate(equal = n_distinct(c_across(q:v)) == 1) %>% 
  pivot_wider(names_from = pair,
              values_from = q:last_col())

You might need to change some of the tidyselect options used for your more complex data.

Output

     ID   q_1   q_2   v_1   v_2 equal_1 equal_2
  <dbl> <dbl> <dbl> <dbl> <dbl> <lgl>   <lgl>  
1     1     0     1     1     0 FALSE   FALSE  
2     2     1     0     1     0 TRUE    TRUE

CodePudding user response：

How about a for loop? for the suffixe numbers :

for(i in unique(gsub("[^0-9]", "", names(df)[-1]))){
  df <- df %>%
    mutate(across(ends_with(i)[1],
           ~if_else(df[,paste0("q_", i)] == df[,paste0("v_", i)], "YES", "NO"),
    .names = '{sub("._", "equal_", .col)}'))
}

> df
#   ID q_1 v_1 q_2 v_2 equal_1 equal_2
# 1  1   0   1   1   0      NO      NO
# 2  2   1   1   0   0     YES     YES

Or in base R, bc it really writes itself better like that :

for(i in unique(strtoi(gsub("[^0-9]","",names(df))))[-1]){
  df[,paste0("equal_", i)] <- ifelse(
    df[,paste0("q_", i)] == df[,paste0("v_", i)], 
    "YES", "NO"
  )
}

> df
#   ID q_1 v_1 q_2 v_2 equal_1 equal_2
# 1  1   0   1   1   0      NO      NO
# 2  2   1   1   0   0     YES     YES