Home > other >  Is there a way to pass more than a single call to either names_sep or names_patterns in pivot_longer
Is there a way to pass more than a single call to either names_sep or names_patterns in pivot_longer

Time:01-04

I have a fairly large data set that I would like to turn into a longer dataframe. I would like to use either names_sep or names_pattern to pass more than a single call.

I provide a replicate version below. However, the real dataset is much larger with some 30 countries and in the end the full dataset has some 7000 columns (i.e.variables).

This is my replicate version:

au <- tribble(
  ~ expert, ~ leftright_au_spo, ~ immigration_au_spo, ~ leftright_au_ovp, ~ immigration_au_ovp,
  1, 3, 4, 5, 6,  
  2, 3, 4, 5, 6,   
  3, 3, 4, 5, 6
)

nl <- tribble(
  ~ expert, ~ leftright_nl_pvda, ~ immigration_nl_pvda, ~ leftright_nl_vvd, ~ immigration_nl_vvd,
  4, 3, 4, 5, 6,  
  5, 3, 4, 5, 6, 
  6, 3, 4, 5, 6
)

Using head:


> head(au)
# A tibble: 3 x 5
  expert leftright_au_spo immigration_au_spo leftright_au_ovp immigration_au_ovp
   <dbl>            <dbl>              <dbl>            <dbl>              <dbl>
1      1                3                  4                5                  6
2      2                3                  4                5                  6
3      3                3                  4                5                  6

> head(nl)
# A tibble: 3 x 5
  expert leftright_nl_pvda immigration_nl_pvda leftright_nl_vvd immigration_nl_vvd
   <dbl>             <dbl>               <dbl>            <dbl>              <dbl>
1      4                 3                   4                5                  6
2      5                 3                   4                5                  6
3      6                 3                   4                5                  6

The wider form of the data has a variable for an issue, a country (separated by x) and a political party. For example: immigration_au_spo

If I do the following, I get what I want in the longer dataframe. I get an expert per party per issue.

austria_longer <- au %>% 
  pivot_longer(
    !expert, 
    names_to = c(".value", "party"), 
    names_sep = "_au_",
    names_transform = list(party = as.factor)) 

netherlands_longer <- nl %>% 
  pivot_longer(
    !expert, 
    names_to = c(".value", "party"), 
    names_sep = "_nl_",
    names_transform = list(party = as.factor))

This produces:


> head(austria_long)
# A tibble: 6 x 4
  expert party leftright immigration
   <dbl> <fct>     <dbl>       <dbl>
1      1 spo           3           4
2      1 ovp           5           6
3      2 spo           3           4
4      2 ovp           5           6
5      3 spo           3           4
6      3 ovp           5           6

And:


> head(netherlands_long)
# A tibble: 6 x 4
  expert party leftright immigration
   <dbl> <fct>     <dbl>       <dbl>
1      4 pvda          3           4
2      4 vvd           5           6
3      5 pvda          3           4
4      5 vvd           5           6
5      6 pvda          3           4
6      6 vvd           5           6

I then join them to make one dataset. Which is what I want. But it is rather labour intensive:


df_combined <- sjmisc::add_rows(austria_longer,
                                netherlands_longer,
                                id = "country")

> head(df_combined, n = 30)
# A tibble: 12 x 5
   country expert party leftright immigration
   <chr>    <dbl> <fct>     <dbl>       <dbl>
 1 1            1 spo           3           4
 2 1            1 ovp           5           6
 3 1            2 spo           3           4
 4 1            2 ovp           5           6
 5 1            3 spo           3           4
 6 1            3 ovp           5           6
 7 2            4 pvda          3           4
 8 2            4 vvd           5           6
 9 2            5 pvda          3           4
10 2            5 vvd           5           6
11 2            6 pvda          3           4
12 2            6 vvd           5           6

However, my question is: if I import the datasets as a list and produce a single dataframe, is there a way to use either the names_sep or the names_pattern to do the same as above, produce a longer version per expert, with a new column for party, and a separate variable (column) per issue (i.e. leftright, immigration). The issue is that I need an unique indicator (here) for each country.

I have not been able to figure out how to pass either more than a single call to names_sep or use names_pattern to produce syntax to solve this problem.


fulldataset_longer <- fulldataset %>% # fulldataset in this case would be both au and nl
  pivot_longer(
    !expert, 
    names_to = c(".value", "party"), 
    names_sep = "_somethinghere_",
    names_transform = list(party = as.factor))

In other words, can I feed names_sep a list or a vector such as: "au", "nl" ..... or is there a way with names_pattern to solve this issue. I have not been able figure out if it is possible to pass more than a singe call to either names_sep or to names_pattern.

CodePudding user response:

Using the names_pattern argument and a custom function you could do:

library(tidyr)
library(purrr)

to_long <- function(x) {
  x %>% 
    pivot_longer(
      !expert, 
      names_to = c(".value", "country", "party"), 
      names_pattern = "(.*)_(.*)_(.*)",
      names_transform = list(party = as.factor)) 
}

list(au, nl) |> 
  purrr::map_df(to_long)
#> # A tibble: 12 × 5
#>    expert country party leftright immigration
#>     <dbl> <chr>   <fct>     <dbl>       <dbl>
#>  1      1 au      spo           3           4
#>  2      1 au      ovp           5           6
#>  3      2 au      spo           3           4
#>  4      2 au      ovp           5           6
#>  5      3 au      spo           3           4
#>  6      3 au      ovp           5           6
#>  7      4 nl      pvda          3           4
#>  8      4 nl      vvd           5           6
#>  9      5 nl      pvda          3           4
#> 10      5 nl      vvd           5           6
#> 11      6 nl      pvda          3           4
#> 12      6 nl      vvd           5           6

CodePudding user response:

Keep the data in a list and use names_pattern instead of names_sep

library(dplyr)
library(tidyr)
library(purrr)
lst(au, nl) %>% 
  map(~ .x %>% 
  pivot_longer(cols = -expert, names_to = c(".value", "party"), 
   names_pattern = "^([^_] )_\\w _(\\w )",
    names_transform = list(party = as.factor))) %>%
  list_rbind(names_to = "Country")

-output

# A tibble: 12 × 5
   Country expert party leftright immigration
   <chr>    <dbl> <fct>     <dbl>       <dbl>
 1 au           1 spo           3           4
 2 au           1 ovp           5           6
 3 au           2 spo           3           4
 4 au           2 ovp           5           6
 5 au           3 spo           3           4
 6 au           3 ovp           5           6
 7 nl           4 pvda          3           4
 8 nl           4 vvd           5           6
 9 nl           5 pvda          3           4
10 nl           5 vvd           5           6
11 nl           6 pvda          3           4
12 nl           6 vvd           5           6

Or use names_sep by extracting the name from the named list

library(stringr)
lst(au, nl) %>% 
   imap(~ .x %>%
   pivot_longer(cols = -expert, names_to = c(".value", "party"), 
     names_sep = str_c('_', .y, '_'),
   names_transform = list(party = as.factor))) %>%
   list_rbind(names_to = "Country")

-output

# A tibble: 12 × 5
   Country expert party leftright immigration
   <chr>    <dbl> <fct>     <dbl>       <dbl>
 1 au           1 spo           3           4
 2 au           1 ovp           5           6
 3 au           2 spo           3           4
 4 au           2 ovp           5           6
 5 au           3 spo           3           4
 6 au           3 ovp           5           6
 7 nl           4 pvda          3           4
 8 nl           4 vvd           5           6
 9 nl           5 pvda          3           4
10 nl           5 vvd           5           6
11 nl           6 pvda          3           4
12 nl           6 vvd           5           6

CodePudding user response:

Here is a different approach:

library(dplyr)
library(tidyr)

bind_rows(au, nl) %>% 
  pivot_longer(-expert, names_to = "name", values_to = "leftright",  values_drop_na = TRUE) %>% 
  separate(name, into=c("left_right", "Country", "party")) %>% 
  mutate(immigration = lead(leftright, default = last(leftright))) %>% 
  filter(row_number() %% 2 == 1) %>% 
  select(Country, expert, party, leftright, immigration)
  Country expert party leftright immigration
   <chr>    <dbl> <chr>     <dbl>       <dbl>
 1 au           1 spo           3           4
 2 au           1 ovp           5           6
 3 au           2 spo           3           4
 4 au           2 ovp           5           6
 5 au           3 spo           3           4
 6 au           3 ovp           5           6
 7 nl           4 pvda          3           4
 8 nl           4 vvd           5           6
 9 nl           5 pvda          3           4
10 nl           5 vvd           5           6
11 nl           6 pvda          3           4
12 nl           6 vvd           5           6
  • Related