I'm trying to get the right regex (following this) to use inside names_pattern
.
The strings are: CRIS_CLAU_ENG_O
and LARI_CLAU_ENG_O
Desired output: CRIS_O
and LARI_O
ID | CLAU_VALUE | RATER
- attempt so far:
data1 %>%
select(ID, contains("CLAU")) %>%
pivot_longer(c(CRIS_CLAU_ENG_O, LARI_CLAU_ENG_O),
names_to = c("RATER", ".value"),
names_pattern = "^([^_] )([^_] )") %>%
## mutate(RATER = case_when(RATER == "CRI" ~ 'RATER1',
RATER == "LAR" ~ 'RATER2')) %>%
## mutate(RATER = factor(RATER, levels = c('RATER1', 'RATER2')))
- If it's possible, ideally, the desired output should contain two
value
columns, like this:
ID | CLAU_VALUE | TUNITS_VALUE | RATER
in this case, tho, the rater would be different: CRIS_WRI
and LARI_WRI
to differ from O and WRI. or I could have another column called 'type' containing this information
pivoting the "TUNITS" columns at the same time as "CLAU" columns.
I'm slipting the strings into the value columns, not into my factor column (I honestly don't know why. I'd like single values columns instead and a single 'RATER' column. I'm probably doing something silly, but thanks in advance, I'd really appreciate.
data:
> dput(data1)
structure(list(ID = c("A", "B", "C", "D", "E", "F", "G", "H",
"I", "J", "K", "L", "M", "N", "O", "P"), CRIS_CLAU_ENG_O = c(6,
5, 6, 7, 6, 3, 5, 5, 6, 6, 7, 9, 8, 6, 6, 6), CRIS_TUNITS_WRI_O = c(5,
5, 4, 5, 5, 3, 5, 5, 4, 4, 7, 7, 7, 6, 6, 5), LARI_CLAU_ENG_O = c(6,
5, 5, 7, 7, 3, 5, 5, 6, 6, 9, 9, 8, 8, 6, 6), LARI_TUNITS_WRI_O = c(5,
3, 4, 6, 5, 3, 2, 5, 4, 4, 7, 8, 7, 6, 6, 5)), row.names = c(NA,
-16L), spec = structure(list(cols = list(ALUNO = structure(list(), class = c("collector_character",
"collector")), CRIS_CLAU_ENG_O = structure(list(), class = c("collector_double",
"collector")), CRIS_TUNITS_WRI_O = structure(list(), class = c("collector_double",
"collector")), LARI_CLAU_ENG_O = structure(list(), class = c("collector_double",
"collector")), LARI_TUNITS_WRI_O = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), delim = ","), class = "col_spec"), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"))
CodePudding user response:
You're both select
ing out the TUNITS
variable and choosing to pivot only a couple of columns. If we keep all columns around, we can get closer. Also, your regex is incomplete, we need to add a literal _
between your two pattern groups.
library(dplyr)
library(tidyr0
data1 %>%
pivot_longer(-ID,
names_to = c("RATER", ".value"),
names_pattern = "^([^_] )_([^_] )_.*")
# # A tibble: 32 × 4
# ID RATER CLAU TUNITS
# <chr> <chr> <dbl> <dbl>
# 1 A CRIS 6 5
# 2 A LARI 6 5
# 3 B CRIS 5 5
# 4 B LARI 5 3
# 5 C CRIS 6 4
# 6 C LARI 5 4
# 7 D CRIS 7 5
# 8 D LARI 7 6
# 9 E CRIS 6 5
# 10 E LARI 7 5
# # … with 22 more rows
# # ℹ Use `print(n = ...)` to see more rows
Regex:
^([^_] )_([^_] )_.*
^ beginning of the string
'-----' '-----' pattern groups
[^_] character group of anything except '_'
one or more ('*' is zero-or-more, '?' is 0-or-1)
_ the literal underscore
.* zero or more ('*') of anything ('.')