Home > Enterprise >  reshape wide long with key specifying levels of grouping variable
reshape wide long with key specifying levels of grouping variable

Time:11-03

I have a dataset

structure(list(group = c("A", "B", "A", "B", "B", "A", "A"), 
    technique = c("attack", "defenese", "attack ", "defense ", 
    "defense ", "attack", "defense "), outcome1.part1 = c(24L, 
    1234L, 15L, 234L, 23L, 3L, 3L), outcome1.part.2 = c(52L, 
    321L, 23L, 234L, 234L, 145L, 145L), outcome1.part.3 = c(14L, 
    23L, 3L, 2L, 234L, 234L, 234L), outcome2.part.1 = c(14L, 
    234L, 145L, 4L, 234L, 145L, 145L), outcome2.part.2 = c(234L, 
    234L, 234L, 234L, 234L, 234L, 234L), outcome2.part.3 = c(234L, 
    234L, 234L, 234L, 145L, 145L, 145L)), class = "data.frame", row.names = c(NA, 
-7L))

Dataset needs to be converted to long form but I want to make a vector that looks for any phrase from a specified key and then decides what level of a grouping variable will be assigned.

So say in column outcome1.part.2 I want to convert this to long form and create a column called strata that uses whatever value from a key is found inside the column name. So the key would be c("part.1", "part.2", "part.3") and it will convert a row like this.

enter image description here

to this

enter image description here

I do not want to use regex solutions because I want to flexibly change the values in the key without figuring out a new regex solution for every level of the grouping variable.

CodePudding user response:

We can use pivot_longer

library(dplyr)
library(tidyr)
library(stringr)
v1 <- c("part.1", "part.2", "part.3")
pat <- sprintf("^(outcome\\d*).*(%s).*$", str_c(v1, collapse="|"))
df1 %>% 
   pivot_longer(cols = starts_with('outcome'), 
    names_to = c(".value", "strata"), 
     names_pattern = pat)

-output

# A tibble: 21 × 5
   group technique  strata outcome1 outcome2
   <chr> <chr>      <chr>     <int>    <int>
 1 A     "attack"   part.1       24       14
 2 A     "attack"   part.2       52      234
 3 A     "attack"   part.3       14      234
 4 B     "defenese" part.1     1234      234
 5 B     "defenese" part.2      321      234
 6 B     "defenese" part.3       23      234
 7 A     "attack "  part.1       15      145
 8 A     "attack "  part.2       23      234
 9 A     "attack "  part.3        3      234
10 B     "defense " part.1      234        4
# … with 11 more rows

NOTE: There was a typo in the column names i.e. third column should be

names(df1)[3] <- 'outcome1.part.1'

CodePudding user response:

A solution without regex, making use of the extra = "merge" argument in separate

library(dplyr)
library(tidyr)
df %>% 
  mutate(id = row_number()) %>% 
  pivot_longer(
    cols = -c(id, group, technique)
  ) %>% 
  separate(name, into=c('outcome', 'strata'), extra = "merge") %>% 
  pivot_wider(
    names_from = outcome,
    values_from = value,
  ) %>% 
  select(-id)
   group technique  strata outcome1 outcome2
   <chr> <chr>      <chr>     <int>    <int>
 1 A     "attack"   part.1       24       14
 2 A     "attack"   part.2       52      234
 3 A     "attack"   part.3       14      234
 4 B     "defenese" part.1     1234      234
 5 B     "defenese" part.2      321      234
 6 B     "defenese" part.3       23      234
 7 A     "attack "  part.1       15      145
 8 A     "attack "  part.2       23      234
 9 A     "attack "  part.3        3      234
10 B     "defense " part.1      234        4
# ... with 11 more rows
  • Related