I have a dataframe that lists which food each participant ate on each day and then another random variable "other"
df <- data.frame(participant = c(1,1,1,2,2,2,2),
food = c("pizza", "turkey", "turkey", "pizza", "pizza", "pizza", "turkey"),
other = c("a", "b", "c", "d", "e", "f", "g"))
that I would like to transform it into a dataframe that shows how many times each participant ate each food while retaining the "other" variable
df2 <- data.frame(participant = c(1,1,1,2,2,2,2),
pizza = c(1,1,1,3,3,3,3),
turkey=c(2,2,2,1,1,1,1),
other = c("a", "b", "c", "d", "e", "f", "g"))
I have attempted
df3 <- df %>% group_by(participant) %>% add_count(food) %>%
pivot_wider(names_from = food, values_from = n)
but this only kind of works, it produces NA values (which makes sense given my understanding of how pivot_wider works). I am wondering if there is another function or a way to manipulate pivot_wider to produce what I want.
I did find a manual work around
df4 <- df3 %>% group_by(participant) %>% arrange(pizza) %>% mutate(pizza =
pizza[1]) %>% arrange(participant) %>% group_by(participant) %>%
arrange(turkey) %>% mutate(turkey = turkey[1]) %>% arrange(participant) %>% ungroup
but for whatever reason in my original dataset I get weird values such as "c(3,3)" occasionally. I am hoping for a fix that does not utilize this extra step given that I am getting these weird values, and also so I can learn how to use pivot_wider or another function in this context.
Thanks!
CodePudding user response:
Here is another option with dummy_cols
library(dplyr)
library(fastDummies)
library(stringr)
df %>%
dummy_cols("food", remove_selected_columns = TRUE) %>%
group_by(participant) %>%
mutate(across(starts_with('food_'),
sum, .names = "{str_remove(.col, '.*_')}")) %>%
ungroup %>%
select(-starts_with('food_'))
-output
# A tibble: 7 × 4
participant other pizza turkey
<dbl> <chr> <int> <int>
1 1 a 1 2
2 1 b 1 2
3 1 c 1 2
4 2 d 3 1
5 2 e 3 1
6 2 f 3 1
7 2 g 3 1
Or from the 'df3', use fill
library(tidyr)
df3 %>%
fill(c(pizza, turkey), .direction = 'downup') %>%
ungroup
-output
# A tibble: 7 × 4
participant other pizza turkey
<dbl> <chr> <int> <int>
1 1 a 1 2
2 1 b 1 2
3 1 c 1 2
4 2 d 3 1
5 2 e 3 1
6 2 f 3 1
7 2 g 3 1
CodePudding user response:
Here is another option:
library(tidyverse)
df <- tibble(participant = c(1,1,1,2,2,2,2),
food = c("pizza", "turkey", "turkey", "pizza", "pizza", "pizza", "turkey"),
other = c("a", "b", "c", "d", "e", "f", "g"))
df |>
group_by(participant, food)|>
mutate(num = n())|>
pivot_wider(names_from = food, values_from = num) |>
mutate(across(pizza:turkey, \(x) max(x, na.rm = TRUE))) |>
ungroup()
#> # A tibble: 7 x 4
#> participant other pizza turkey
#> <dbl> <chr> <int> <int>
#> 1 1 a 1 2
#> 2 1 b 1 2
#> 3 1 c 1 2
#> 4 2 d 3 1
#> 5 2 e 3 1
#> 6 2 f 3 1
#> 7 2 g 3 1
CodePudding user response:
A simple workaround:
df3 <- df3 <- df %>% group_by(participant) %>% add_count(food) %>%
pivot_wider(names_from = food, values_from = n, values_fill = 0) %>%
group_by(participant) %>% mutate(across(unique(df$food), max)) %>% ungroup()