as a beginner I am grateful for every hint and explanation. I have a nested list with redditors (I was scraping with RedditExtractor) and their about, comments and threads.
With lapply
I could select the necessary elements, that I need for further analysis.
df_raw_comments_threads <- lapply(all_authors_2019_14_content, `[`,
c("comments", "threads"))
So this worked and next I wanted to select only date_utc and subreddit,
df_test_comments_threads <- lapply(df_raw_comments_threads, `[`,
c("subreddit", "date_utc"))
which did not worked because it turned out as "NULL"
I thought I could do something like this:
lapply(df_raw_comments_threads[[x]][[i]][[c("date_utc", "subreddit"]])
since the code says: df_raw_comments_threads[["The_Wombles"]][["comments"]][["date_utc"]]
, though I want date_utc
und subreddit
for both, comments and threads and for every redditor
.
I also tried:
df_test_comments_threads <- lapply(df_raw_comments_threads[[]][[i]],
function(i) "subreddit")
my_list_subset <- df_raw_comments_threads[sapply(df_raw_comments_threads,
function(x) "subreddit")]
df_test_comments_threads <- map(df_raw_comments_threads,
"date_utc")
df_test_comments_threads <- lapply(df_raw_comments_threads[[]][["threads"]][["date_utc"]])
df_test_comments_threads <- lapply(all_authors_2019_14_content, `[`,
c("date_utc", "subreddit"))
None worked and I am confused why it turned out as NULL since both redditors have entries in every section.
str(df_raw_comments_threads)
$ The_Wombles :List of 2
..$ comments:'data.frame': 1000 obs. of 12 variables:
.. ..$ url : chr [1:1000] "https://www.reddit.com/r/nursing/comments/uwk2wj/deleted_by_user/" "https://www.reddit.com/r/nursing/comments/uwk2wj/deleted_by_user/" "https://www.reddit.com/r/nursing/comments/uwk2wj/deleted_by_user/" "https://www.reddit.com/r/LifeProTips/comments/uwd8d2/lpt_let_your_daughters_paint_your_nails_have_a/" ...
.. ..$ date_utc : chr [1:1000] "2022-05-24" "2022-05-24" "2022-05-24" "2022-05-24" ...
.. ..$ timestamp : num [1:1000] 1.65e 09 1.65e 09 1.65e 09 1.65e 09 1.65e 09 ...
.. ..$ subreddit : chr [1:1000] "nursing" "nursing" "nursing" "LifeProTips" ...
.. ..$ thread_author : chr [1:1000] "[deleted]" "[deleted]" "[deleted]" "ckayfish" ...
.. ..$ comment_author: chr [1:1000] "The_Wombles" "The_Wombles" "The_Wombles" "The_Wombles" ...
.. ..$ thread_title : chr [1:1000] "[deleted by user]" "[deleted by user]" "[deleted by user]" "LPT: Let your daughters paint your nails, have a tea party with them, and help them set up a lemon-aid stand. B"| __truncated__ ...
.. ..$ comment : chr [1:1000] "Every place is going to be different. because you have EAP you may (depending on your job and the offense) be a"| __truncated__ "Get off your high horse. I suppose you have never made a mistake in your life. Not everything is so black and w"| __truncated__ "For being a sub about nursing a lot of people here don\031t seem to understand addiction." "Lol when you know you know" ...
.. ..$ score : num [1:1000] 1 3 1 1 1 1 3 51 10 3 ...
.. ..$ up : num [1:1000] 1 3 1 1 1 1 3 51 10 3 ...
.. ..$ downs : num [1:1000] 0 0 0 0 0 0 0 0 0 0 ...
.. ..$ golds : num [1:1000] 0 0 0 0 0 0 0 0 0 0 ...
..$ threads :'data.frame': 34 obs. of 11 variables:
.. ..$ url : chr [1:34] "https://www.reddit.com/gallery/uwivn2" "https://i.redd.it/amr5yi8acjv81.jpg" "https://www.reddit.com/r/copypasta/comments/t8rvfe/i_stopped_smoking_weed_recently_and_guy_fieri_has/" "https://m.youtube.com/watch?v=_Z7603OvpO0" ...
.. ..$ date_utc : chr [1:34] "2022-05-24" "2022-04-24" "2022-03-07" "2022-01-30" ...
.. ..$ timestamp: num [1:34] 1.65e 09 1.65e 09 1.65e 09 1.64e 09 1.64e 09 ...
.. ..$ subreddit: chr [1:34] "Miata" "stihl" "copypasta" "UnsolvedMysteries" ...
.. ..$ author : chr [1:34] "The_Wombles" "The_Wombles" "The_Wombles" "The_Wombles" ...
.. ..$ title : chr [1:34] "A Diamond in the Dirt" "This Farmboss is almost 30 years old and starts fist pull." "I stopped smoking weed recently and Guy Fieri has been attacking me in my dreams since" "In February 2018, a Toronto firefighter enjoying the last day of his ski trip in New York with friends and coll"| __truncated__ ...
.. ..$ text : chr [1:34] "" "" "I stopped smoking weed recently and Guy Fieri has been attacking me in my dreams since\n\nI wasn\031t sure wher"| __truncated__ "" ...
.. ..$ golds : num [1:34] 0 0 0 0 0 0 0 0 0 0 ...
.. ..$ score : num [1:34] 22 29 7 1 3 ...
.. ..$ ups : num [1:34] 22 29 7 1 3 ...
.. ..$ downs : num [1:34] 0 0 0 0 0 0 0 0 0 0 ...
$ Europa_Teles_BTR:List of 2
..$ comments:'data.frame': 1000 obs. of 12 variables:
.. ..$ url : chr [1:1000] "https://www.reddit.com/r/Warthunder/comments/grilnr/ww2_german_uboat_submarine_development_19331945/" "https://www.reddit.com/r/HistoriaEmPortugues/comments/grhtvq/sabia_que_madrid_chegou_a_ser_ocupada_pelos/" "https://www.reddit.com/r/Warthunder/comments/grilnr/ww2_german_uboat_submarine_development_19331945/" "https://www.reddit.com/r/Warthunder/comments/gr4zmg/ww2_history_luftwaffe_pilots_studying_the/" ...
.. ..$ date_utc : chr [1:1000] "2020-05-27" "2020-05-27" "2020-05-27" "2020-05-26" ...
.. ..$ timestamp : num [1:1000] 1.59e 09 1.59e 09 1.59e 09 1.59e 09 1.59e 09 ...
.. ..$ subreddit : chr [1:1000] "Warthunder" "HistoriaEmPortugues" "Warthunder" "Warthunder" ...
.. ..$ thread_author : chr [1:1000] "[deleted]" "fan_of_the_pikachu" "[deleted]" "Europa_Teles_BTR" ...
.. ..$ comment_author: chr [1:1000] "Europa_Teles_BTR" "Europa_Teles_BTR" "Europa_Teles_BTR" "Europa_Teles_BTR" ...
.. ..$ thread_title : chr [1:1000] "WW2 German U-boat (submarine) development, 1933-1945" "Sabia que Madrid chegou a ser ocupada pelos portugueses?" "WW2 German U-boat (submarine) development, 1933-1945" "[WW2 History] Luftwaffe pilots studying the defensive angles of hostile bombers" ...
.. ..$ comment : chr [1:1000] "In reply of u/RhodieRanger ; u/aintme_mustbeyou ; u/quietbob515\n\n&#x200B;\n\nTake in mind this post was m"| __truncated__ "Muito interessante, obrigado pela partilha!\n\n&#x200B;\n\ndevia ser partilhado com a malta do r/PORTUGALCARALHO ahah" "In the ['STARFIGHTERS UPDATE / WAR THUNDER'](https://youtu.be/_4s6xOWwXQM?t=349) video by the official War Thun"| __truncated__ "Thank you man =)" ...
.. ..$ score : num [1:1000] -1 4 -1 5 62 10 6 1 8 1 ...
.. ..$ up : num [1:1000] -1 4 -1 5 62 10 6 1 8 1 ...
.. ..$ downs : num [1:1000] 0 0 0 0 0 0 0 0 0 0 ...
.. ..$ golds : num [1:1000] 0 0 0 0 0 0 0 0 0 0 ...
..$ threads :'data.frame': 699 obs. of 11 variables:
.. ..$ url : chr [1:699] "https://www.reddit.com/r/quotes/comments/9ppwro/faith_is_the_sword_forged_against_fate_europa/" "https://www.reddit.com/r/EuropaTelesBTR/comments/9ppgge/5_reasons_why_you_must_quit_gaming_today/" "https://www.reddit.com/r/EuropaTelesBTR/comments/9ppgge/5_reasons_why_you_must_quit_gaming_today/" "https://www.youtube.com/watch?v=m5zN5niz7X0" ...
.. ..$ date_utc : chr [1:699] "2018-10-20" "2018-10-20" "2018-10-20" "2018-10-19" ...
.. ..$ timestamp: num [1:699] 1.54e 09 1.54e 09 1.54e 09 1.54e 09 1.54e 09 ...
.. ..$ subreddit: chr [1:699] "quotes" "StopGaming" "EuropaTelesBTR" "portugal" ...
.. ..$ author : chr [1:699] "Europa_Teles_BTR" "Europa_Teles_BTR" "Europa_Teles_BTR" "Europa_Teles_BTR" ...
.. ..$ title : chr [1:699] "\"Faith is the sword forged against fate\" - Europa_Teles_BTR" "5 REASONS WHY you must quit gaming TODAY (x-post from r/EuropaTelesBTR)" "5 REASONS WHY you must quit gaming TODAY" "Enfermeiras Pára-quedistas | Guerra colonial Portuguesa [Vídeo] (x-post de r/HistoriaEmPortugues)" ...
.. ..$ text : chr [1:699] "" "" "Playing videogames can be seen as a hobby, but it can easily build up into something very destructive and addic"| __truncated__ "" ...
.. ..$ golds : num [1:699] 0 0 0 0 0 0 0 0 0 0 ...
.. ..$ score : num [1:699] 3 12 9 14 9 70 26 1 7 17 ...
.. ..$ ups : num [1:699] 3 12 9 14 9 70 26 1 7 17 ...
.. ..$ downs : num [1:699] 0 0 0 0 0 0 0 0 0 0 ...
CodePudding user response:
Since you're using package {purrr} anyway, you could apply map
variants:
example data structure (you forgot to dput
one):
all_authors <- list(The_Wombles = list(about = list(), comments = structure(list(
url = c("url1", "url2"), date_utc = c("2022-05-24",
"2022-05-25")), class = "data.frame", row.names = c(NA, -2L
))), Talking_Foo = list(about = list(), comments = structure(list(
url = c("url3", "url4"), date_utc = c("2022-05-24",
"2022-05-25")), class = "data.frame", row.names = c(NA, -2L
))))
use map
:
library(dplyr) ## provides `select`
all_authors |>
map(~ .x$comments |> select(url, date_utc))
(where ~
is short for the function you apply to each list element and .x
is short for the list item currently fed into the function: ~ .x
is equivalent to function(x){x}
)
output:
$The_Wombles
url date_utc
1 url1 2022-05-24
2 url2 2022-05-25
$Talking_Foo
url date_utc
1 url3 2022-05-24
2 url4 2022-05-25
or imap_dfr
(dfr = dataframe row) to obtain a dataframe:
all_authors |>
imap_dfr(~ list(author =.y ,
comments = .x$comments |>
select(url, date_utc))
)
(where .y
is short for the name, and .x
for the content of the list item currently being mapped)
output:
# A tibble: 4 x 2
author comments$url $date_utc
<chr> <chr> <chr>
1 The_Wombles url1 2022-05-24
2 The_Wombles url2 2022-05-25
3 Talking_Foo url3 2022-05-24
4 Talking_Foo url4 2022-05-25