using sample in aggregate-CodePudding

I have a list of lists. Every element of the main list is a list that correspond to a note,every note in the list, has an integrer called fold_ID. I want to sample one note of each folder ID. I am currently doing this:

folder_ID<-function(lYSt){
  L<-lYSt
  f91<-cbind(sapply(L, `[[`, "fold_id"),
             seq(1,length(L)),
             seq(1,length(L)))
  colnames(f91)<-c("Folder ID","Note #",
                   "#Of notes in folder")
  f91<-as.data.frame(f91)
  f81<-table(sapply(L, `[[`, "fold_id"))
  for(i in 1:length(f91[,1])){
    fgd3<-as.numeric(f91[i,1])
    fgd3<-f81[as.numeric(names(f81))==fgd3]
    f91[i,3]<-fgd3
  }
    f92<-aggregate(f91$`Note #`,
                   by = list(f91$`Folder ID`,f91$`#Of notes in folder`),
                   function(x) sample(x,size = 1))
return(f92)
}

however, to test that each element sampled indeed belongs to the correspondent folder ID I did this:

eg<-folder_ID(LIST)

for(i in 1:length(eg[,2])){
  print(NT.2[[eg[i,3]]]$fold_id)
  print(eg[i,1])
  print("________________________________")
}

however, to my surprise, not every element sampled corresponded to the respective folder ID. I want this part

    f92<-aggregate(f91$`Note #`,
                   by = list(f91$`Folder ID`,f91$`#Of notes in folder`),
                   function(x) sample(x,size = 1))

to sample exclusively from each folder ID. Now, strangely, it mostly samples from the respective folder ID, but not always. I want the output to conserve the numnber of notes in folder part.

EDIT this is an exmaple of the list:

[[1]]
[[1]]$fold_id
[1] 1
[[1]]$content
[1] "whats written in the note"
[[2]]
[[2]]$fold_id
[1] 2
[[2]]$content
[1] "whats written in the second note"

CodePudding user response：

You could try this way.. lets assume your list-of-lists is called mylist, and it has the structure that you show above

push your folder ids and notes into a table

library(data.table)
dat =data.table(fold_id = sapply(mylist,\(x) x[["fold_id"]]),
                content = sapply(mylist,\(x) x[["content"]])
)

Now,sample one note from each folder

dat[, .SD[sample(1:.N,1)],  by=fold_id]

I made a fake list-of-lists like this:

# fake list of lists
mylist = lapply(1:500, \(x) list(fold_id = sample(1:10,1),content=paste0(sample(letters,25), collapse="-")))

It looks like this:

> mylist[1:3]
[[1]]
[[1]]$fold_id
[1] 4

[[1]]$content
[1] "y-e-m-a-n-r-s-q-g-i-o-d-w-p-h-l-f-b-c-j-t-v-z-x-u"


[[2]]
[[2]]$fold_id
[1] 7

[[2]]$content
[1] "m-q-f-k-g-z-u-x-i-b-t-e-j-y-n-d-s-w-c-v-h-o-a-l-p"


[[3]]
[[3]]$fold_id
[1] 7

[[3]]$content
[1] "t-w-q-n-x-b-p-j-e-s-a-h-r-u-v-f-z-i-k-c-g-y-l-d-o"

The result of the above manipulation, returns one note from each of the ten folder ids

    fold_id                                           content
 1:       4 k-x-h-n-g-e-p-f-z-w-a-j-r-i-o-c-m-d-q-b-v-l-y-u-s
 2:       7 q-v-n-l-d-k-a-u-h-x-w-e-f-r-c-y-p-b-z-j-m-g-s-o-i
 3:       5 k-q-m-v-p-g-b-f-t-l-r-i-u-c-x-a-y-n-o-s-e-w-d-j-z
 4:       1 w-r-t-f-a-j-b-n-q-v-u-d-i-e-s-c-l-k-m-z-h-p-g-x-o
 5:       6 p-a-v-f-d-z-c-n-x-j-m-b-s-l-w-o-h-e-y-t-i-u-r-g-q
 6:       9 b-y-v-o-j-i-g-m-q-f-t-e-u-d-a-z-c-k-p-x-h-r-n-w-l
 7:       3 h-u-s-a-o-t-b-p-r-k-j-x-q-z-e-m-y-v-d-n-i-f-l-w-c
 8:       8 v-f-m-u-c-d-o-t-h-x-l-p-r-j-g-a-s-y-w-e-n-i-z-q-k
 9:       2 j-o-r-x-g-p-t-v-z-a-n-l-y-e-f-w-s-h-k-q-m-i-u-b-c
10:      10 p-n-z-c-k-a-l-o-s-g-j-f-i-b-w-d-q-y-u-v-t-r-m-x-e

Here is another way, if you don't want to use this table, sample-by-group approach.

get unique folders

folders = unique(sapply(mylist, \(x) x[["fold_id"]]))

loop over each folder, selecting one of the notes at random

lapply(folders, \(f) {
  notes = unlist(lapply(mylist, \(x) if(x[["fold_id"]] == f) x[["content"]]))
  sample(notes, 1)
})

Output:

[[1]]
[1] "w-c-u-t-l-m-n-a-g-x-f-p-i-k-y-h-d-z-o-e-v-r-b-s-q"

[[2]]
[1] "v-p-a-t-c-u-e-h-q-i-o-g-j-l-s-y-k-r-x-w-b-d-z-n-f"

[[3]]
[1] "i-a-n-c-j-s-z-q-u-o-d-p-w-l-e-t-g-b-k-f-x-v-h-m-y"

[[4]]
[1] "f-n-w-l-b-t-m-e-a-v-i-d-x-o-k-y-h-g-u-r-c-q-s-j-z"

[[5]]
[1] "x-q-a-g-j-r-k-u-y-l-p-i-w-d-h-m-v-o-e-t-n-c-z-f-b"

[[6]]
[1] "b-y-v-o-j-i-g-m-q-f-t-e-u-d-a-z-c-k-p-x-h-r-n-w-l"

[[7]]
[1] "g-a-t-u-p-o-s-l-h-r-d-f-v-m-q-x-k-z-c-i-y-b-w-e-n"

[[8]]
[1] "b-f-r-o-x-q-l-m-a-j-n-t-w-p-g-c-e-u-z-v-k-y-d-s-i"

[[9]]
[1] "s-a-m-j-c-q-t-u-w-d-y-l-e-g-k-v-b-z-n-x-o-r-p-i-h"

[[10]]
[1] "x-y-u-e-q-h-f-d-a-r-o-n-k-w-t-p-b-g-v-l-m-i-j-z-s"