tibble(x = rep(1:3, 2),
y = list(1:5, 1:10, 10:20, 20:40, 1:50, 5:10)) -> df
df
#> # A tibble: 6 × 2
#> x y
#> <int> <list>
#> 1 1 <int [5]>
#> 2 2 <int [10]>
#> 3 3 <int [11]>
#> 4 1 <int [21]>
#> 5 2 <int [50]>
#> 6 3 <int [6]>
I want to group_by 'x' and summmarise the vectors of each group into a single vector. I tried using c(), but it didn't help.
df %>%
group_by(x) %>%
summarise(z = c(y))
#> `summarise()` has grouped output by 'x'. You can override using the `.groups`
#> argument.
#> # A tibble: 6 × 2
#> # Groups: x [3]
#> x z
#> <int> <list>
#> 1 1 <int [5]>
#> 2 1 <int [21]>
#> 3 2 <int [10]>
#> 4 2 <int [50]>
#> 5 3 <int [11]>
#> 6 3 <int [6]>
I also want a union of elements in a group or any other similar function applied to these kinds of datasets.
df %>%
group_by(x) %>%
summarise(z = union(y))
#> Error in `summarise()`:
#> ! Problem while computing `z = union(y)`.
#> ℹ The error occurred in group 1: x = 1.
#> Caused by error in `base::union()`:
#> ! argument "y" is missing, with no default
CodePudding user response:
If you want the data to remain nested, you can do
df %>%
group_by(x) %>%
summarise(z = list(unlist(y)))
The c()
function won't work because it' doesn't unnest-lists. For example, compare
c(list(1:3, 4:5))
unlist(list(1:3, 4:5))
The c
function doesn't return a single vector. But unlist
does. This matters because your function will recieve a list of matching row values when you use summarize
.
Also note that if you leave off the list()
, the values don't be nested anymore
df %>%
group_by(x) %>%
summarise(z = unlist(y))
# x z
# <int> <int>
# 1 1 1
# 2 1 2
# 3 1 3
# 4 1 4
# 5 1 5
# 6 1 20
# 7 1 21
# ...