Home > Blockchain >  What does in mean that "You can have a column of a data frame that is itself a data frame?"
What does in mean that "You can have a column of a data frame that is itself a data frame?"

Time:02-03

I was reading through the column-wise operations documentation for tidyverse's dplyr here: https://dplyr.tidyverse.org/articles/colwise.html, and toward the end of the article there are three bullet points, the first of which reads as follows:

"You can have a column of a data frame that is itself a data frame. This is something provided by base R, but it’s not very well documented, and it took a while to see that it was useful, not just a theoretical curiosity."

I'm not sure I understand what this means. Can someone provide example code of how to create a dataframe that has a column that is itself a dataframe so I can try to understand what this means?"

CodePudding user response:

A quick example is shown using dplyr::nest_by, where the data column contains data frames.

Here each of the data frames in data are just corresponding data for each of the species.

Actually, read the docs of tidyr::nest.

library(dplyr)

iris %>% nest_by(Species)

#> # A tibble: 3 × 2
#> # Rowwise:  Species
#>   Species                  data
#>   <fct>      <list<tibble[,4]>>
#> 1 setosa               [50 × 4]
#> 2 versicolor           [50 × 4]
#> 3 virginica            [50 × 4]

CodePudding user response:

Actually, a data.frame column can be a list. In base R we can use list2DF to create a data.frame from a list. Note, that data.frames are just a special kind of lists (Ref.).

To make a data.frame out of a vector and a list, we can do:

df <- list2DF(list(X1=1:3, X2=list(1:3, 1:3, 1:3)))
df
#   X1      X2
# 1  1 1, 2, 3
# 2  2 1, 2, 3
# 3  3 1, 2, 3

where

str(df)
# 'data.frame': 3 obs. of  2 variables:
# $ X1: int  1 2 3
# $ X2:List of 3
#  ..$ : int  1 2 3
#  ..$ : int  1 2 3
#  ..$ : int  1 2 3

CodePudding user response:

You can construct such a data.frame using I.

df <- data.frame(a = 1:10,
                 b = I(data.frame(a = 1:10, b = letters[1:10])))

Although df is not printable, you can check its contents:

df$b
##>     a b
##> 1   1 a
##> 2   2 b
##> 3   3 c
##> 4   4 d
##> 5   5 e
##> 6   6 f
##> 7   7 g
##> 8   8 h
##> 9   9 i
##> 10 10 j

Or more conveniently convert to a tibble:

tibble::as_tibble(df)
# A tibble: 10 × 2
       a   b$a $b   
   <int> <int> <chr>
 1     1     1 a    
 2     2     2 b    
 3     3     3 c    
 4     4     4 d    
 5     5     5 e    
 6     6     6 f    
 7     7     7 g    
 8     8     8 h    
 9     9     9 i    
10    10    10 j    
  • Related