I have a tibble with a column containing (nested) tibbles. The nested tibbles have duplicated data (same names, same values):
df <- tibble(id = 1:2, data = list(tibble(id = 1, var1 = "a", var2 = "b"), tibble(id = 2, var1 = "c", var2 = "d")))
df
# # A tibble: 2 x 2
# id data
# <int> <list>
# 1 1 <tibble [1 x 3]>
# 2 2 <tibble [1 x 3]>
Calling df %>% unnest(data)
results in
Error: Names must be unique.
.
I would like to write a function that drops these columns beforehand but don't know how. My goal is to be run to use something along the lines of:
df %>%
drop_duplicated_cols(data) %>%
unnest(data)
Which would result in:
#> # A tibble: 2 x 4
#> id var1 var2
#> <int> <chr> <chr>
#> 1 1 a b
#> 2 2 c d
CodePudding user response:
You can just use unnest_longer
library(tidyverse)
df %>%
unnest_longer(data)
which gives:
# A tibble: 2 x 2
id data$id $var1 $var2
<int> <dbl> <chr> <chr>
1 1 1 a b
2 2 2 c d
If you want to truly convert the tibble column, you can also use:
df %>%
left_join(reduce(df$data, bind_rows), by = "id")
which gives:
# A tibble: 2 x 4
id data var1 var2
<dbl> <list> <chr> <chr>
1 1 <tibble [1 x 3]> a b
2 2 <tibble [1 x 3]> c d
And from there you can e.g. unselect the data column.
CodePudding user response:
There is a names_repair
argument in unnest
. By default, it is "check_unique"
and this causes the error. If we change it to another option i.e. "unique", it will make the columns that are duplicated with ..
followed by some digits, then we use select
to get the columns not duplicated
library(dplyr)
library(tidyr)
library(stringr)
df %>%
unnest(data, names_repair = "unique") %>%
select(names(.)[!duplicated(str_remove(names(.), "\\. .*"))]) %>%
rename_with(~ str_remove(., "\\. .*"), contains(".."))
-output
# A tibble: 2 x 3
id var1 var2
<int> <chr> <chr>
1 1 a b
2 2 c d
Or another option to avoid the warnings is to remove the duplicate columns within the list
by looping with map
library(purrr)
out <- df %>%
mutate(data = map(data, ~ .x %>%
select(-any_of(names(df))))) %>%
unnest(data)
out
# A tibble: 2 x 3
id var1 var2
<int> <chr> <chr>
1 1 a b
2 2 c d
> str(out)
tibble [2 × 3] (S3: tbl_df/tbl/data.frame)
$ id : int [1:2] 1 2
$ var1: chr [1:2] "a" "c"
$ var2: chr [1:2] "b" "d"
NOTE: Both solutions give the expected output as in the OP's post along with the structure