Home > other >  Why dplyr::select NULL error when mapping nested list?
Why dplyr::select NULL error when mapping nested list?

Time:12-20

With stackoverflow help I could manage to finalize my script. One of my problems got solved here: How to select multiple, inner list elements with lapply in R?. Since my script was ready I tried to handle a bigger nested list now and I am stuck again at the same line of code with a new error.

This is the nested list:

str(df_raw_comments_threads)
List of 89
 $ tgg001              :List of 2
  ..$ comments:'data.frame':    992 obs. of  12 variables:
  .. ..$ date_utc      : chr [1:992] "2019-02-05" "2019-02-05" "2019-02-05" "2019-02-05" ...
  .. ..$ timestamp     : num [1:992] 1.55e 09 1.55e 09 1.55e 09 1.55e 09 1.55e 09 ...
  .. ..$ subreddit     : chr [1:992] "hardwareswap" "hardwareswap" "hardwareswap" "hardwareswap" ...
  .. ..$ thread_author : chr [1:992] "aagarwal82" "aagarwal82" "aagarwal82" "[deleted]" ...
  .. ..$ comment_author: chr [1:992] "tgg001" "tgg001" "tgg001" "tgg001" ...
  .. ..$ thread_title  : chr [1:992] "[USA-NC] [H] NVIDIA GTX 1080 TI MSI Duke [W] PayPal" "[USA-NC] [H] NVIDIA GTX 1080 TI MSI Duke [W] PayPal" "[USA-NC] [H] NVIDIA GTX 1080 TI MSI Duke [W] PayPal" "[USA-VA] [H] G.Skill Trident Z RGB 32GB 4 x 8GB DDR4-3200, Enermax Liqtech TR4 II 280mm, Asus X399 Zenith Extre"| __truncated__ ...
  .. ..$ comment       : chr [1:992] "Yes I paid for it" "SOLD TO ME W\017" "check pm" "Well I need the other 16gb.... OP can we split this?" ...
  .. ..$ score         : num [1:992] 1 1 1 1 1 1 1 1 1 2 ...
  .. ..$ up            : num [1:992] 1 1 1 1 1 1 1 1 1 2 ...
  .. ..$ downs         : num [1:992] 0 0 0 0 0 0 0 0 0 0 ...
  .. ..$ golds         : num [1:992] 0 0 0 0 0 0 0 0 0 0 ...
  ..$ threads :'data.frame':    62 obs. of  11 variables:
  .. ..$ date_utc : chr [1:62] "2019-01-22" "2019-01-21" "2019-01-21" "2018-11-29" ...
  .. ..$ timestamp: num [1:62] 1.55e 09 1.55e 09 1.55e 09 1.54e 09 1.54e 09 ...
  .. ..$ subreddit: chr [1:62] "FortniteCompetitive" "FortNiteBR" "FortNiteBR" "Kanye" ...
  .. ..$ author   : chr [1:62] "tgg001" "tgg001" "tgg001" "tgg001" ...
  .. ..$ title    : chr [1:62] "Well rip my 4th consecutive solo win streak today. he had 1 kill and 50 health... clearly the better player" "15 kills later basically solo squad and this is how I die. Epic can we vault trees?" "I went AFK for 20 seconds and I came back at the perfect time lol" "=«>â Is he wavy? >\024 If not try to roast him with Kanye lyric-related comments" ...
  .. ..$ text     : chr [1:62] "" "" "" "" ...
  .. ..$ golds    : num [1:62] 0 0 0 0 0 0 0 0 0 0 ...
  .. ..$ score    : num [1:62] 0 3289 3115 0 41 ...
  .. ..$ ups      : num [1:62] 0 3289 3115 0 41 ...
  .. ..$ downs    : num [1:62] 0 0 0 0 0 0 0 0 0 0 ...

I want to select "subreddit" and "date_utc" from each df of each element in the nested list:

  df_map_date_sub <- map2( 
df_raw_comments_threads |> map(~ .x$comments |> dplyr::select("subreddit", "date_utc")), df_raw_comments_threads |> map(~ .x$threads |> dplyr::select("subreddit", "date_utc")), 
~ list(comments = .x, threads = .y) )

Error in UseMethod("select") : 
  no applicable method for 'select' applied to an object of class "NULL"

I thought wow okay, this line already worked so maybe I restart R, then I thought hm okay maybe the list does have some NULLs, then I could trycatch so I found this code to check if there are some NULL entries:

 group_errs = df_map_comments_threads %>%  map(~ .x$comments |> 
     keep(~is.null(.x) ) ) %>%
     names()
> length(group_errs)
[1] 89

I am very confused now. What does the error mean? pls help

I also tried:

 df_map_date_sub <- map2( 
   df_raw_comments_threads |> map(.x$comments, ~ dplyr::select(.,subreddit, date_utc)), df_raw_comments_threads |> map(.x$threads, ~ dplyr::select(.,subreddit, date_utc)), 
   ~ list(comments = .x, threads = .y) )
Error in as_mapper(.f, ...) : object '.x' not found

CodePudding user response:

We could use purrr::safely to provide a default empty data frame of comments and threads in case of error:

extract_subreddit_date_safely <- safely(
  ~ list( 
    comments = .x[["comments"]] |> dplyr::select("subreddit", "date_utc"),
    threads = .x[["threads"]] |> dplyr::select("subreddit", "date_utc")
  ),
  otherwise = list(comments = data.frame(), threads = data.frame())
) 

res_safe <- 
  df_raw_comments_threads %>% 
  map(extract_subreddit_date_safely)

res <- 
  res_safe %>% 
  map("result")

Examine problematic records with:

df_raw_comments_threads[map_lgl(res_safe, ~ !is.null(.x[["error"]]))]
  • Related