I am trying to convert a long dataframe to a wide dataframe using the spread
function in tidyr
and within a dplyr
pipe. My data is in the following format:-
df<-structure(list(Hour = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L),
dt = c("05/06/2021", "07/06/2021", "08/06/2021", "10/06/2021",
"12/06/2021", "12/06/2021", "14/06/2021", "14/06/2021", "15/06/2021",
"09/06/2021", "10/06/2021", "10/06/2021", "11/06/2021", "13/06/2021",
"14/06/2021", "05/06/2021", "06/06/2021", "07/06/2021", "08/06/2021",
"08/06/2021", "09/06/2021", "09/06/2021", "10/06/2021", "11/06/2021",
"12/06/2021"), Programme_watched = c("Wrestling", "Drama",
"Drama", "Football", "Rugby", "Movie", "Movie", "News", "Music",
"Tennis", "Racing", "Racing", "Documentary ", "Movie", "Movie",
"Athletics", "News", "Children", "Music", "Movie", "Football",
"News", "Football", "Rugby", "Drama")), class = "data.frame", row.names = c(NA,
-25L))
df
Hour dt Programme_watched
1 0 05/06/2021 Wrestling
2 0 07/06/2021 Drama
3 0 08/06/2021 Drama
4 0 10/06/2021 Football
5 0 12/06/2021 Rugby
6 0 12/06/2021 Movie
7 0 14/06/2021 Movie
8 0 14/06/2021 News
9 0 15/06/2021 Music
10 1 09/06/2021 Tennis
11 1 10/06/2021 Racing
12 1 10/06/2021 Racing
13 1 11/06/2021 Documentary
14 1 13/06/2021 Movie
15 1 14/06/2021 Movie
16 2 05/06/2021 Athletics
17 2 06/06/2021 News
18 2 07/06/2021 Children
19 2 08/06/2021 Music
20 2 08/06/2021 Movie
21 2 09/06/2021 Football
22 2 09/06/2021 News
23 2 10/06/2021 Football
24 2 11/06/2021 Rugby
25 2 12/06/2021 Drama
I want to be able to make this wider, turning the dates into the column headings with the Programme_watched
variable as the value. However, this is what I get:-
df%>%
group_by(Hour,dt)%>%
summarise(Programmes=Programme_watched)%>%
spread(dt, Programmes)
`summarise()` has grouped output by 'Hour', 'dt'. You can override using the `.groups` argument.
Error: Each row of output must be identified by a unique combination of keys.
Keys are shared for 10 rows:
* 19, 20
* 21, 22
* 11, 12
* 5, 6
* 7, 8
Is it because there is more than one value for some of the grouping variables i.e. Hour
and dt
? If so, I would like these two values to be pasted into the one cell together, then spread by the dt
variable. Can anyone help?
CodePudding user response:
Are you looking for such a solution?
spread
is superseded:
Development on spread()
is complete, and for new code we recommend switching to pivot_wider()
, which is easier to use, more featureful, and still under active development. df %>% spread(key, value)
is equivalent to df %>% pivot_wider(names_from = key, values_from = value)
df%>%
group_by(Hour,dt)%>%
summarise(Programmes=Programme_watched) %>%
ungroup() %>%
pivot_wider(
names_from= dt,
values_from = Programmes
) %>%
unnest()
# A tibble: 6 x 12
Hour `05/06/2021` `07/06/2021` `08/06/2021` `10/06/2021` `12/06/2021` `14/06/2021` `15/06/2021`
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 0 Wrestling Drama Drama Football Rugby Movie Music
2 0 Wrestling Drama Drama Football Movie News Music
3 1 NA NA NA Racing NA Movie NA
4 1 NA NA NA Racing NA Movie NA
5 2 Athletics Children Music Football Drama NA NA
6 2 Athletics Children Movie Football Drama NA NA
# ... with 4 more variables: 09/06/2021 <chr>, 11/06/2021 <chr>, 13/06/2021 <chr>, 06/06/2021 <chr>
OR
library(tidyr)
library(dplyr)
df %>%
pivot_wider(
names_from= dt,
values_from = Programme_watched
) %>%
unnest() %>%
distinct()
Hour `05/06/2021` `07/06/2021` `08/06/2021` `10/06/2021` `12/06/2021` `14/06/2021` `15/06/2021` `09/06/2021` `11/06/2021` `13/06/2021` `06/06/2021`
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 0 Wrestling Drama Drama Football Rugby Movie Music NA NA NA NA
2 0 Wrestling Drama Drama Football Movie News Music NA NA NA NA
3 1 NA NA NA Racing NA Movie NA Tennis "Documentary " Movie NA
4 2 Athletics Children Music Football Drama NA NA Football "Rugby" NA News
5 2 Athletics Children Movie Football Drama NA NA News "Rugby" NA News
CodePudding user response:
An option is also to create a sequence by group (rowid
) before the pivot_wider
to take care of the duplicate elements
library(data.table)
library(tidyr)
library(dplyr)
df%>%
group_by(Hour,dt)%>%
summarise(Programmes=Programme_watched, .groups = 'drop') %>%
mutate(rn = rowid(Hour, dt)) %>%
pivot_wider(names_from = dt, values_from = Programmes) %>%
select(-rn)
-output
# A tibble: 6 × 12
Hour `05/06/2021` `07/06/2021` `08/06/2021` `10/06/2021` `12/06/2021` `14/06/2021` `15/06/2021` `09/06/2021` `11/06/2021` `13/06/2021` `06/06/2021`
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 0 Wrestling Drama Drama Football Rugby Movie Music <NA> <NA> <NA> <NA>
2 0 <NA> <NA> <NA> <NA> Movie News <NA> <NA> <NA> <NA> <NA>
3 1 <NA> <NA> <NA> Racing <NA> Movie <NA> Tennis "Documentary " Movie <NA>
4 1 <NA> <NA> <NA> Racing <NA> <NA> <NA> <NA> <NA> <NA> <NA>
5 2 Athletics Children Music Football Drama <NA> <NA> Football "Rugby" <NA> News
6 2 <NA> <NA> Movie <NA> <NA> <NA> <NA> News <NA> <NA> <NA>