Regex for names_pattern while pivoting longer-CodePudding

I'm trying to get the right regex (following this) to use inside names_pattern.

The strings are: CRIS_CLAU_ENG_O and LARI_CLAU_ENG_O
Desired output: CRIS_O and LARI_O

ID | CLAU_VALUE | RATER

attempt so far:

data1 %>% 
  select(ID, contains("CLAU")) %>% 
  pivot_longer(c(CRIS_CLAU_ENG_O, LARI_CLAU_ENG_O),
               names_to = c("RATER", ".value"),
               names_pattern = "^([^_] )([^_] )") %>% 
 ## mutate(RATER = case_when(RATER == "CRI" ~ 'RATER1',    
                           RATER == "LAR" ~ 'RATER2')) %>% 
 ## mutate(RATER = factor(RATER, levels = c('RATER1', 'RATER2')))

If it's possible, ideally, the desired output should contain two value columns, like this:

ID | CLAU_VALUE | TUNITS_VALUE | RATER

in this case, tho, the rater would be different: CRIS_WRI and LARI_WRI to differ from O and WRI. or I could have another column called 'type' containing this information

pivoting the "TUNITS" columns at the same time as "CLAU" columns.

I'm slipting the strings into the value columns, not into my factor column (I honestly don't know why. I'd like single values columns instead and a single 'RATER' column. I'm probably doing something silly, but thanks in advance, I'd really appreciate.
data:

> dput(data1)
structure(list(ID = c("A", "B", "C", "D", "E", "F", "G", "H", 
"I", "J", "K", "L", "M", "N", "O", "P"), CRIS_CLAU_ENG_O = c(6, 
5, 6, 7, 6, 3, 5, 5, 6, 6, 7, 9, 8, 6, 6, 6), CRIS_TUNITS_WRI_O = c(5, 
5, 4, 5, 5, 3, 5, 5, 4, 4, 7, 7, 7, 6, 6, 5), LARI_CLAU_ENG_O = c(6, 
5, 5, 7, 7, 3, 5, 5, 6, 6, 9, 9, 8, 8, 6, 6), LARI_TUNITS_WRI_O = c(5, 
3, 4, 6, 5, 3, 2, 5, 4, 4, 7, 8, 7, 6, 6, 5)), row.names = c(NA, 
-16L), spec = structure(list(cols = list(ALUNO = structure(list(), class = c("collector_character", 
"collector")), CRIS_CLAU_ENG_O = structure(list(), class = c("collector_double", 
"collector")), CRIS_TUNITS_WRI_O = structure(list(), class = c("collector_double", 
"collector")), LARI_CLAU_ENG_O = structure(list(), class = c("collector_double", 
"collector")), LARI_TUNITS_WRI_O = structure(list(), class = c("collector_double", 
"collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), delim = ","), class = "col_spec"),  class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"))

CodePudding user response：

You're both selecting out the TUNITS variable and choosing to pivot only a couple of columns. If we keep all columns around, we can get closer. Also, your regex is incomplete, we need to add a literal _ between your two pattern groups.

library(dplyr)
library(tidyr0
data1 %>% 
  pivot_longer(-ID,
               names_to = c("RATER", ".value"),
               names_pattern = "^([^_] )_([^_] )_.*")
# # A tibble: 32 × 4
#    ID    RATER  CLAU TUNITS
#    <chr> <chr> <dbl>  <dbl>
#  1 A     CRIS      6      5
#  2 A     LARI      6      5
#  3 B     CRIS      5      5
#  4 B     LARI      5      3
#  5 C     CRIS      6      4
#  6 C     LARI      5      4
#  7 D     CRIS      7      5
#  8 D     LARI      7      6
#  9 E     CRIS      6      5
# 10 E     LARI      7      5
# # … with 22 more rows
# # ℹ Use `print(n = ...)` to see more rows

Regex:

^([^_] )_([^_] )_.*
^                      beginning of the string
 '-----' '-----'       pattern groups
  [^_]                 character group of anything except '_'
                       one or more ('*' is zero-or-more, '?' is 0-or-1)
        _              the literal underscore
                 .*    zero or more ('*') of anything ('.')