Home > Software engineering >  Regex for names_pattern while pivoting longer
Regex for names_pattern while pivoting longer

Time:01-28

I'm trying to get the right regex (following this) to use inside names_pattern.

The strings are: CRIS_CLAU_ENG_O and LARI_CLAU_ENG_O
Desired output: CRIS_O and LARI_O

ID | CLAU_VALUE | RATER

  • attempt so far:
data1 %>% 
  select(ID, contains("CLAU")) %>% 
  pivot_longer(c(CRIS_CLAU_ENG_O, LARI_CLAU_ENG_O),
               names_to = c("RATER", ".value"),
               names_pattern = "^([^_] )([^_] )") %>% 
 ## mutate(RATER = case_when(RATER == "CRI" ~ 'RATER1',    
                           RATER == "LAR" ~ 'RATER2')) %>% 
 ## mutate(RATER = factor(RATER, levels = c('RATER1', 'RATER2')))
  • If it's possible, ideally, the desired output should contain two value columns, like this:

ID | CLAU_VALUE | TUNITS_VALUE | RATER

in this case, tho, the rater would be different: CRIS_WRI and LARI_WRI to differ from O and WRI. or I could have another column called 'type' containing this information

pivoting the "TUNITS" columns at the same time as "CLAU" columns.

  • I'm slipting the strings into the value columns, not into my factor column (I honestly don't know why. I'd like single values columns instead and a single 'RATER' column. I'm probably doing something silly, but thanks in advance, I'd really appreciate.

  • data:

> dput(data1)
structure(list(ID = c("A", "B", "C", "D", "E", "F", "G", "H", 
"I", "J", "K", "L", "M", "N", "O", "P"), CRIS_CLAU_ENG_O = c(6, 
5, 6, 7, 6, 3, 5, 5, 6, 6, 7, 9, 8, 6, 6, 6), CRIS_TUNITS_WRI_O = c(5, 
5, 4, 5, 5, 3, 5, 5, 4, 4, 7, 7, 7, 6, 6, 5), LARI_CLAU_ENG_O = c(6, 
5, 5, 7, 7, 3, 5, 5, 6, 6, 9, 9, 8, 8, 6, 6), LARI_TUNITS_WRI_O = c(5, 
3, 4, 6, 5, 3, 2, 5, 4, 4, 7, 8, 7, 6, 6, 5)), row.names = c(NA, 
-16L), spec = structure(list(cols = list(ALUNO = structure(list(), class = c("collector_character", 
"collector")), CRIS_CLAU_ENG_O = structure(list(), class = c("collector_double", 
"collector")), CRIS_TUNITS_WRI_O = structure(list(), class = c("collector_double", 
"collector")), LARI_CLAU_ENG_O = structure(list(), class = c("collector_double", 
"collector")), LARI_TUNITS_WRI_O = structure(list(), class = c("collector_double", 
"collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), delim = ","), class = "col_spec"),  class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"))

CodePudding user response:

You're both selecting out the TUNITS variable and choosing to pivot only a couple of columns. If we keep all columns around, we can get closer. Also, your regex is incomplete, we need to add a literal _ between your two pattern groups.

library(dplyr)
library(tidyr0
data1 %>% 
  pivot_longer(-ID,
               names_to = c("RATER", ".value"),
               names_pattern = "^([^_] )_([^_] )_.*")
# # A tibble: 32 × 4
#    ID    RATER  CLAU TUNITS
#    <chr> <chr> <dbl>  <dbl>
#  1 A     CRIS      6      5
#  2 A     LARI      6      5
#  3 B     CRIS      5      5
#  4 B     LARI      5      3
#  5 C     CRIS      6      4
#  6 C     LARI      5      4
#  7 D     CRIS      7      5
#  8 D     LARI      7      6
#  9 E     CRIS      6      5
# 10 E     LARI      7      5
# # … with 22 more rows
# # ℹ Use `print(n = ...)` to see more rows

Regex:

^([^_] )_([^_] )_.*
^                      beginning of the string
 '-----' '-----'       pattern groups
  [^_]                 character group of anything except '_'
                       one or more ('*' is zero-or-more, '?' is 0-or-1)
        _              the literal underscore
                 .*    zero or more ('*') of anything ('.')
  • Related