I have a dataset
structure(list(group = c("A", "B", "A", "B", "B", "A", "A"),
technique = c("attack", "defenese", "attack ", "defense ",
"defense ", "attack", "defense "), outcome1.part1 = c(24L,
1234L, 15L, 234L, 23L, 3L, 3L), outcome1.part.2 = c(52L,
321L, 23L, 234L, 234L, 145L, 145L), outcome1.part.3 = c(14L,
23L, 3L, 2L, 234L, 234L, 234L), outcome2.part.1 = c(14L,
234L, 145L, 4L, 234L, 145L, 145L), outcome2.part.2 = c(234L,
234L, 234L, 234L, 234L, 234L, 234L), outcome2.part.3 = c(234L,
234L, 234L, 234L, 145L, 145L, 145L)), class = "data.frame", row.names = c(NA,
-7L))
Dataset needs to be converted to long form but I want to make a vector that looks for any phrase from a specified key and then decides what level of a grouping variable will be assigned.
So say in column outcome1.part.2
I want to convert this to long form and create a column called strata
that uses whatever value from a key is found inside the column name. So the key would be c("part.1", "part.2", "part.3")
and it will convert a row like this.
to this
I do not want to use regex solutions because I want to flexibly change the values in the key without figuring out a new regex solution for every level of the grouping variable.
CodePudding user response:
We can use pivot_longer
library(dplyr)
library(tidyr)
library(stringr)
v1 <- c("part.1", "part.2", "part.3")
pat <- sprintf("^(outcome\\d*).*(%s).*$", str_c(v1, collapse="|"))
df1 %>%
pivot_longer(cols = starts_with('outcome'),
names_to = c(".value", "strata"),
names_pattern = pat)
-output
# A tibble: 21 × 5
group technique strata outcome1 outcome2
<chr> <chr> <chr> <int> <int>
1 A "attack" part.1 24 14
2 A "attack" part.2 52 234
3 A "attack" part.3 14 234
4 B "defenese" part.1 1234 234
5 B "defenese" part.2 321 234
6 B "defenese" part.3 23 234
7 A "attack " part.1 15 145
8 A "attack " part.2 23 234
9 A "attack " part.3 3 234
10 B "defense " part.1 234 4
# … with 11 more rows
NOTE: There was a typo in the column names i.e. third column should be
names(df1)[3] <- 'outcome1.part.1'
CodePudding user response:
A solution without regex, making use of the extra = "merge"
argument in separate
library(dplyr)
library(tidyr)
df %>%
mutate(id = row_number()) %>%
pivot_longer(
cols = -c(id, group, technique)
) %>%
separate(name, into=c('outcome', 'strata'), extra = "merge") %>%
pivot_wider(
names_from = outcome,
values_from = value,
) %>%
select(-id)
group technique strata outcome1 outcome2
<chr> <chr> <chr> <int> <int>
1 A "attack" part.1 24 14
2 A "attack" part.2 52 234
3 A "attack" part.3 14 234
4 B "defenese" part.1 1234 234
5 B "defenese" part.2 321 234
6 B "defenese" part.3 23 234
7 A "attack " part.1 15 145
8 A "attack " part.2 23 234
9 A "attack " part.3 3 234
10 B "defense " part.1 234 4
# ... with 11 more rows