Home > front end >  troubleshooting pivot_wide after group_by
troubleshooting pivot_wide after group_by

Time:10-07

I have a dataframe that lists which food each participant ate on each day and then another random variable "other"

df <- data.frame(participant = c(1,1,1,2,2,2,2),
 food = c("pizza", "turkey", "turkey", "pizza", "pizza", "pizza", "turkey"),
 other = c("a", "b", "c", "d", "e", "f", "g"))

that I would like to transform it into a dataframe that shows how many times each participant ate each food while retaining the "other" variable

df2 <- data.frame(participant = c(1,1,1,2,2,2,2),
 pizza = c(1,1,1,3,3,3,3),
 turkey=c(2,2,2,1,1,1,1),
 other = c("a", "b", "c", "d", "e", "f", "g"))

I have attempted

df3 <- df %>% group_by(participant) %>% add_count(food) %>%
 pivot_wider(names_from = food, values_from = n)

but this only kind of works, it produces NA values (which makes sense given my understanding of how pivot_wider works). I am wondering if there is another function or a way to manipulate pivot_wider to produce what I want.

I did find a manual work around

df4 <- df3 %>% group_by(participant) %>% arrange(pizza) %>% mutate(pizza = 
 pizza[1]) %>% arrange(participant) %>% group_by(participant) %>% 
 arrange(turkey) %>% mutate(turkey = turkey[1]) %>% arrange(participant) %>% ungroup

but for whatever reason in my original dataset I get weird values such as "c(3,3)" occasionally. I am hoping for a fix that does not utilize this extra step given that I am getting these weird values, and also so I can learn how to use pivot_wider or another function in this context.

Thanks!

CodePudding user response:

Here is another option with dummy_cols

library(dplyr)
library(fastDummies)
library(stringr)
df %>% 
  dummy_cols("food", remove_selected_columns = TRUE) %>% 
  group_by(participant) %>% 
  mutate(across(starts_with('food_'),
     sum, .names = "{str_remove(.col, '.*_')}")) %>% 
    ungroup %>% 
  select(-starts_with('food_'))

-output

# A tibble: 7 × 4
  participant other pizza turkey
        <dbl> <chr> <int>  <int>
1           1 a         1      2
2           1 b         1      2
3           1 c         1      2
4           2 d         3      1
5           2 e         3      1
6           2 f         3      1
7           2 g         3      1

Or from the 'df3', use fill

library(tidyr)
df3 %>% 
   fill(c(pizza, turkey), .direction = 'downup') %>%
   ungroup

-output

# A tibble: 7 × 4
  participant other pizza turkey
        <dbl> <chr> <int>  <int>
1           1 a         1      2
2           1 b         1      2
3           1 c         1      2
4           2 d         3      1
5           2 e         3      1
6           2 f         3      1
7           2 g         3      1

CodePudding user response:

Here is another option:

library(tidyverse)

df <- tibble(participant = c(1,1,1,2,2,2,2),
 food = c("pizza", "turkey", "turkey", "pizza", "pizza", "pizza", "turkey"),
 other = c("a", "b", "c", "d", "e", "f", "g"))


df |>
  group_by(participant, food)|>
  mutate(num = n())|>
  pivot_wider(names_from = food, values_from = num) |>
  mutate(across(pizza:turkey, \(x) max(x, na.rm = TRUE))) |>
  ungroup()
#> # A tibble: 7 x 4
#>   participant other pizza turkey
#>         <dbl> <chr> <int>  <int>
#> 1           1 a         1      2
#> 2           1 b         1      2
#> 3           1 c         1      2
#> 4           2 d         3      1
#> 5           2 e         3      1
#> 6           2 f         3      1
#> 7           2 g         3      1

CodePudding user response:

A simple workaround:

df3 <- df3 <- df %>% group_by(participant) %>% add_count(food) %>%
   pivot_wider(names_from = food, values_from = n, values_fill = 0) %>% 
   group_by(participant) %>% mutate(across(unique(df$food), max)) %>% ungroup()
  • Related