Home > Net >  pivot_wider results in list column column instead of expected results
pivot_wider results in list column column instead of expected results

Time:09-29

I'm just going to chalk this up to my ignorance, but sometimes the pivot_* functions drive me crazy.

I have a tibble:

# A tibble: 12 x 3
    year term              estimate
   <dbl> <chr>                <dbl>
 1  2018 intercept           -29.8
 2  2018 daysuntilelection     8.27
 3  2019 intercept           -50.6
 4  2019 daysuntilelection     7.40
 5  2020 intercept           -31.6
 6  2020 daysuntilelection     6.55
 7  2021 intercept           -19.0
 8  2021 daysuntilelection     4.60
 9  2022 intercept           -10.7
10  2022 daysuntilelection     6.41
11  2023 intercept           120
12  2023 daysuntilelection     0

that I would like to flip to:

# A tibble: 6 x 3
   year intercept daysuntilelection
  <dbl>     <dbl>             <dbl>
1  2018     -29.8              8.27
2  2019     -50.6              7.40
3  2020     -31.6              6.55
4  2021     -19.0              4.60
5  2022     -10.7              6.41
6  2023     120                0

Normally pivot_wider should be able to do this as x %>% pivot_wider(!year, names_from = "term", values_from = "estimate") but instead it returns a two-column tibble with lists and a bunch of warning.

# A tibble: 1 x 2
  intercept daysuntilelection
  <list>    <list>
1 <dbl [6]> <dbl [6]>
Warning message:
Values from `estimate` are not uniquely identified; output will contain list-cols.
* Use `values_fn = list` to suppress this warning.
* Use `values_fn = {summary_fun}` to summarise duplicates.
* Use the following dplyr code to identify duplicates.
  {data} %>%
    dplyr::group_by(term) %>%
    dplyr::summarise(n = dplyr::n(), .groups = "drop") %>%
    dplyr::filter(n > 1L)

Where do I go wrong here? Help!

CodePudding user response:

Next to the solutions offered in the comments, data.table's dcast is a very fast implementation to pivot your data. If the pivot_ functions drive you crazy, maybe this is a nice alternative for you:

x <- read.table(text = "
 1  2018 intercept           -29.8
 2  2018 daysuntilelection     8.27
 3  2019 intercept           -50.6
 4  2019 daysuntilelection     7.40
 5  2020 intercept           -31.6
 6  2020 daysuntilelection     6.55
 7  2021 intercept           -19.0
 8  2021 daysuntilelection     4.60
 9  2022 intercept           -10.7
10  2022 daysuntilelection     6.41
11  2023 intercept           120
12  2023 daysuntilelection     0")

names(x) <- c("id", "year", "term", "estimate")

library(data.table)
dcast(as.data.table(x), year ~ term)
#> Using 'estimate' as value column. Use 'value.var' to override
#>    year daysuntilelection intercept
#> 1: 2018              8.27     -29.8
#> 2: 2019              7.40     -50.6
#> 3: 2020              6.55     -31.6
#> 4: 2021              4.60     -19.0
#> 5: 2022              6.41     -10.7
#> 6: 2023              0.00     120.0

CodePudding user response:

DATA

df <- read.table(text = "
 1  2018 intercept           -29.8
 2  2018 daysuntilelection     8.27
 3  2019 intercept           -50.6
 4  2019 daysuntilelection     7.40
 5  2020 intercept           -31.6
 6  2020 daysuntilelection     6.55
 7  2021 intercept           -19.0
 8  2021 daysuntilelection     4.60
 9  2022 intercept           -10.7
10  2022 daysuntilelection     6.41
11  2023 intercept           120
12  2023 daysuntilelection     0")

CODE

library(tidyverse)
df %>% 
  pivot_wider(names_from = V3,values_from = V4 , values_fill = 0) %>% 
  group_by(V2) %>% 
  summarise_all(sum,na.rm=T)

OUTPUT

     V2    V1 intercept daysuntilelection
  <int> <int>     <dbl>             <dbl>
1  2018     3     -29.8              8.27
2  2019     7     -50.6              7.4 
3  2020    11     -31.6              6.55
4  2021    15     -19                4.6 
5  2022    19     -10.7              6.41
6  2023    23     120                0   
  • Related