Home > Software design >  use dplyr to get list items from dataframe in R
use dplyr to get list items from dataframe in R

Time:11-01

I have a dataframe being returned from Microsoft365R:

SKA_student <- structure(list(name = "Computing SKA 2021-22.xlsx", size = 22266L, 
             lastModifiedBy = 
               structure(list(user = 
                      structure(list(email = "[email protected]", 
                                     id = "8ae50289-d7af-4779-91dc-e4638421f422", 
                                     displayName = "Name, My"), class = "data.frame", row.names = c(NA, -1L))), 
                      class = "data.frame", row.names = c(NA, -1L)), 
             fileSystemInfo = structure(list(
               createdDateTime = "2021-09-08T16:03:38Z", 
               lastModifiedDateTime = "2021-09-16T00:09:04Z"), class = "data.frame", row.names = c(NA,-1L))), row.names = c(NA, -1L), class = "data.frame")

I can return all the lastModifiedBy data through:

SKA_student %>% select(lastModifiedBy)

lastModifiedBy.user.email               lastModifiedBy.user.id lastModifiedBy.user.displayName
1              my@email.com 8ae50289-d7af-4779-91dc-e4638421f422                        Name, My

But if I want a specific item in the lastModifiedBy list, it doesn't work, e.g.:

SKA_student %>% select(lastModifiedBy.user.email)

Error: Can't subset columns that don't exist.
x Column `lastModifiedBy.user.email` doesn't exist.

I can get this working through base, but would really like a dplyr answer

CodePudding user response:

This function allows you to flatten all the list columns (I found this ages ago on SO but can't find the original post for credit)

SO_flat_cols <- function(data) {
    ListCols <- sapply(data, is.list)
    cbind(data[!ListCols], t(apply(data[ListCols], 1, unlist)))
}

Then you can select as you like.

SO_flat_cols (SKA_student) %>%
  select(lastModifiedBy.user.email)

Alternatively you can get to the end by recursively pulling the lists

SKA_student %>%
  pull(lastModifiedBy) %>%
  pull(user) %>%
  select(email)

CodePudding user response:

You could use

library(dplyr)
library(tidyr)

SKA_student %>% 
  unnest_wider(lastModifiedBy) %>% 
  select(email)

This returns

# A tibble: 1 x 1
  email       
  <chr>       
1 my@email.com
  • Related