Home > Net >  Select entries from data.frame, based on column names and values stored in a list
Select entries from data.frame, based on column names and values stored in a list

Time:11-10

I have a data.frame similar to this:

mydf=data.frame(LETTERS=LETTERS, rev_letters=rev(letters), var1=c(rep('a',10),rep('b',10),rep('c',6)), value=1:26)

> head(mydf)
  LETTERS rev_letters var1 value
1       A           z    a     1
2       B           y    a     2
3       C           x    a     3
4       D           w    a     4
5       E           v    a     5
6       F           u    a     6

I want to select the row indexes that correspond to the columns and values stored in a list, like this one:

mylist=list(LETTERS=c('A','M','X'), var1='b')

> mylist
$LETTERS
[1] "A" "M" "X"

$var1
[1] "b"

I would like to do something like the following, but for all columns and values at once:

> which(mydf[,names(mylist)[1]] %in% mylist[[1]])
[1]  1 13 24

... or even better as a TRUE/FALSE variable:

> mydf[,names(mylist)[1]] %in% mylist[[1]]
 [1]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[13]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
[25] FALSE FALSE

The idea is to end up with a single variable of all the indexes for all the columns and values in the list; in the example above, the result would be:

> indexes
 [1]  1 11 12 13 14 15 16 17 18 19 20 24

... or the TRUE/FALSE counterpart:

> indexes
 [1]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE
[13]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE  TRUE
[25] FALSE FALSE

Thanks!

CodePudding user response:

With %in% sapply:

mydf=data.frame(LETTERS=LETTERS, rev_letters=rev(letters), var1=c(rep('a',10),rep('b',10),rep('c',6)), value=1:26)
mylist = list(LETTERS = c('A','M','X'), var1 = 'b')

rowSums(sapply(names(mylist), function(x) mydf[[x]] %in% mylist[[x]])) != 0
# [1]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#[11]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
#[21] FALSE FALSE FALSE  TRUE FALSE FALSE

which(rowSums(sapply(names(mylist), function(x) mydf[[x]] %in% mylist[[x]])) != 0)
#[1]  1 11 12 13 14 15 16 17 18 19 20 24

CodePudding user response:

Loop through names and use which:

sort(unique(unlist(sapply(names(mylist), function(i){
  which(mydf[, i] %in% mylist[[ i ]])
  }))))
# [1]  1 11 12 13 14 15 16 17 18 19 20 24
  • Related