Home > Software engineering >  pivot_longer into several pairs of columns
pivot_longer into several pairs of columns

Time:03-05

I'm needing to pivot_longer across multiple groups of columns, creating multiple names--values pairs.

For instance, I need to go from something like this:

df_raw <- tribble(
  ~id, ~belief_dog, ~belief_bull_frog, ~belief_fish, ~age, ~norm_bull_frog, ~norm_fish, ~norm_dog, ~gender,
  "b2x8",    1,           4,          3,         41,     4,       2,          10,         2,
  "m89w",    3,           6,          2,         19,     1,       2,           3,         1,
  "32x8",    1,           5,          2,         38,     9,       1,           8,         3
)

And turn it into something lie this:

df_final <- tribble(
  ~id,   ~belief_animal, ~belief_rating, ~norm_animal, ~norm_rating, ~age,   ~gender,
  "b2x8",    "dog",           1,          "bull_frog",      4,        41,       2,
  "b2x8",    "bull_frog",     4,          "fish",           2,        41,       2,
  "b2x8",    "fish",          3,          "dog",            10,       41,       2,
  "m89w",    "dog",           3,          "bull_frog",      1,        19,       1,
  "m89w",    "bull_frog",     6,          "fish",           2,        19,       1,
  "m89w",    "fish",          2,          "dog",            3,        19,       1,
  "32x8",    "dog",           1,          "bull_frog",      9,        38,       3,
  "32x8",    "bull_frog",     5,          "fish",           1,        38,       3,
  "32x8",    "fish",          2,          "dog",            8,        38,       3
)

In other words, anything starting with "belief_" should get pivoted in one names--values pair & anything starting with "norm_" should be pivoted into another names--values pair.

I tried looking at several other Stack Overflow pages with somewhat related content but wasn't able to translate those solutions to this situation.

Any help would be appreciated, with a strong preference for dplyr solutions.

THANKS!

CodePudding user response:

Solved it with a bit more experimentation!

The key comes down to both the names_to & the names_pattern arguments.

df_raw %>% pivot_longer(
  cols = c(belief_dog:belief_fish, norm_bull_frog:norm_dog),
  names_to = c(".value", "rating"),
  names_pattern = "([a-z] )_*(. )"
)

I don't really understand how ".value" or the regex "([a-z] )_*(. )" work, but the solution works nonetheless.

CodePudding user response:

With tidyverse, we can pivot on the two sets of columns that starts with belief and norm. We can then use regex to split into groups according to the first underscore (since some column names have multiple underscores). Essentially, we are saying to put belief or norm (the first group in the column name) into their own columns (i.e., .value), then the second part of the group (i.e., animal names) are put into one column named animal.

library(tidyverse)

df_raw %>%
  pivot_longer(cols = c(starts_with("belief"), starts_with("norm")),
               names_to = c('.value', 'animal'),
               names_pattern = '(.*?)_(.*)') %>% 
  rename(belief_rating = belief, norm_rating = norm)

Output

  id      age gender animal    belief_rating norm_rating
  <chr> <dbl>  <dbl> <chr>             <dbl>       <dbl>
1 b2x8     41      2 dog                   1          10
2 b2x8     41      2 bull_frog             4           4
3 b2x8     41      2 fish                  3           2
4 m89w     19      1 dog                   3           3
5 m89w     19      1 bull_frog             6           1
6 m89w     19      1 fish                  2           2
7 32x8     38      3 dog                   1           8
8 32x8     38      3 bull_frog             5           9
9 32x8     38      3 fish                  2           1
  • Related