I have a list of lists. Every element of the main list is a list that correspond to a note,every note in the list, has an integrer called fold_ID. I want to sample one note of each folder ID. I am currently doing this:
folder_ID<-function(lYSt){
L<-lYSt
f91<-cbind(sapply(L, `[[`, "fold_id"),
seq(1,length(L)),
seq(1,length(L)))
colnames(f91)<-c("Folder ID","Note #",
"#Of notes in folder")
f91<-as.data.frame(f91)
f81<-table(sapply(L, `[[`, "fold_id"))
for(i in 1:length(f91[,1])){
fgd3<-as.numeric(f91[i,1])
fgd3<-f81[as.numeric(names(f81))==fgd3]
f91[i,3]<-fgd3
}
f92<-aggregate(f91$`Note #`,
by = list(f91$`Folder ID`,f91$`#Of notes in folder`),
function(x) sample(x,size = 1))
return(f92)
}
however, to test that each element sampled indeed belongs to the correspondent folder ID I did this:
eg<-folder_ID(LIST)
for(i in 1:length(eg[,2])){
print(NT.2[[eg[i,3]]]$fold_id)
print(eg[i,1])
print("________________________________")
}
however, to my surprise, not every element sampled corresponded to the respective folder ID. I want this part
f92<-aggregate(f91$`Note #`,
by = list(f91$`Folder ID`,f91$`#Of notes in folder`),
function(x) sample(x,size = 1))
to sample exclusively from each folder ID. Now, strangely, it mostly samples from the respective folder ID, but not always. I want the output to conserve the numnber of notes in folder part.
EDIT this is an exmaple of the list:
[[1]]
[[1]]$fold_id
[1] 1
[[1]]$content
[1] "whats written in the note"
[[2]]
[[2]]$fold_id
[1] 2
[[2]]$content
[1] "whats written in the second note"
CodePudding user response:
You could try this way.. lets assume your list-of-lists is called mylist
, and it has the structure that you show above
push your folder ids and notes into a table
library(data.table)
dat =data.table(fold_id = sapply(mylist,\(x) x[["fold_id"]]),
content = sapply(mylist,\(x) x[["content"]])
)
Now,sample one note from each folder
dat[, .SD[sample(1:.N,1)], by=fold_id]
I made a fake list-of-lists like this:
# fake list of lists
mylist = lapply(1:500, \(x) list(fold_id = sample(1:10,1),content=paste0(sample(letters,25), collapse="-")))
It looks like this:
> mylist[1:3]
[[1]]
[[1]]$fold_id
[1] 4
[[1]]$content
[1] "y-e-m-a-n-r-s-q-g-i-o-d-w-p-h-l-f-b-c-j-t-v-z-x-u"
[[2]]
[[2]]$fold_id
[1] 7
[[2]]$content
[1] "m-q-f-k-g-z-u-x-i-b-t-e-j-y-n-d-s-w-c-v-h-o-a-l-p"
[[3]]
[[3]]$fold_id
[1] 7
[[3]]$content
[1] "t-w-q-n-x-b-p-j-e-s-a-h-r-u-v-f-z-i-k-c-g-y-l-d-o"
The result of the above manipulation, returns one note from each of the ten folder ids
fold_id content
1: 4 k-x-h-n-g-e-p-f-z-w-a-j-r-i-o-c-m-d-q-b-v-l-y-u-s
2: 7 q-v-n-l-d-k-a-u-h-x-w-e-f-r-c-y-p-b-z-j-m-g-s-o-i
3: 5 k-q-m-v-p-g-b-f-t-l-r-i-u-c-x-a-y-n-o-s-e-w-d-j-z
4: 1 w-r-t-f-a-j-b-n-q-v-u-d-i-e-s-c-l-k-m-z-h-p-g-x-o
5: 6 p-a-v-f-d-z-c-n-x-j-m-b-s-l-w-o-h-e-y-t-i-u-r-g-q
6: 9 b-y-v-o-j-i-g-m-q-f-t-e-u-d-a-z-c-k-p-x-h-r-n-w-l
7: 3 h-u-s-a-o-t-b-p-r-k-j-x-q-z-e-m-y-v-d-n-i-f-l-w-c
8: 8 v-f-m-u-c-d-o-t-h-x-l-p-r-j-g-a-s-y-w-e-n-i-z-q-k
9: 2 j-o-r-x-g-p-t-v-z-a-n-l-y-e-f-w-s-h-k-q-m-i-u-b-c
10: 10 p-n-z-c-k-a-l-o-s-g-j-f-i-b-w-d-q-y-u-v-t-r-m-x-e
Here is another way, if you don't want to use this table, sample-by-group approach.
- get unique folders
folders = unique(sapply(mylist, \(x) x[["fold_id"]]))
- loop over each folder, selecting one of the notes at random
lapply(folders, \(f) {
notes = unlist(lapply(mylist, \(x) if(x[["fold_id"]] == f) x[["content"]]))
sample(notes, 1)
})
Output:
[[1]]
[1] "w-c-u-t-l-m-n-a-g-x-f-p-i-k-y-h-d-z-o-e-v-r-b-s-q"
[[2]]
[1] "v-p-a-t-c-u-e-h-q-i-o-g-j-l-s-y-k-r-x-w-b-d-z-n-f"
[[3]]
[1] "i-a-n-c-j-s-z-q-u-o-d-p-w-l-e-t-g-b-k-f-x-v-h-m-y"
[[4]]
[1] "f-n-w-l-b-t-m-e-a-v-i-d-x-o-k-y-h-g-u-r-c-q-s-j-z"
[[5]]
[1] "x-q-a-g-j-r-k-u-y-l-p-i-w-d-h-m-v-o-e-t-n-c-z-f-b"
[[6]]
[1] "b-y-v-o-j-i-g-m-q-f-t-e-u-d-a-z-c-k-p-x-h-r-n-w-l"
[[7]]
[1] "g-a-t-u-p-o-s-l-h-r-d-f-v-m-q-x-k-z-c-i-y-b-w-e-n"
[[8]]
[1] "b-f-r-o-x-q-l-m-a-j-n-t-w-p-g-c-e-u-z-v-k-y-d-s-i"
[[9]]
[1] "s-a-m-j-c-q-t-u-w-d-y-l-e-g-k-v-b-z-n-x-o-r-p-i-h"
[[10]]
[1] "x-y-u-e-q-h-f-d-a-r-o-n-k-w-t-p-b-g-v-l-m-i-j-z-s"