I have a dataframe s1
s1=data.frame(no=c(1,2,3,4,5,6,7),col=c("red","green","blue","yellow","blue","black","white"),car_mod=c("car2","car4","car1","car5","car7","car3","car1"))
no col car_mod
1 1 red car2
2 2 green car4
3 3 blue car1
4 4 yellow car5
5 5 blue car7
6 6 black car3
7 7 white car1
and a list l
l=list(list(c("green","blue","red"),c("car1","car2","car5")))
[[1]]
[[1]][[1]]
[1] "green" "blue" "red"
[[1]][[2]]
[1] "car1" "car2" "car5"
I want to create a function which only selects the rows in which the element in the column "col" and the element in the column "car_mod" are present in the list ( the element in col should be present in l[1][1] while car_mod should be present in l[1][2])
The output should look something like this
s_new=data.frame(no=c(1,3),col=c("red","blue"),car_mod=c("car2","car1"))
no col car_mod
1 1 red car2
2 3 blue car1
Note, the actual dataframe and list are very large. I tried doing something like this
for(i in l[1]){
for(j in l[2]){
if(i %in% s1$col & j %in% s1$car_mod){
select()
}
}
}
But im not sure how to proceed or if using loops is the best approach due to the size of the dataframe
CodePudding user response:
You can use subset
(or dplyr::filter
):
> subset(s1, col %in% l[[1]][[1]] & car_mod %in% l[[1]][[2]])
no col car_mod
1 1 red car2
3 3 blue car1
CodePudding user response:
A posible solution with filter:
s_new <- s1 %>% filter(col %in% l[[1]][1][[1]] & car_mod %in% l[[1]][2][[1]])
s_new
no col car_mod
1 1 red car2
2 3 blue car1
CodePudding user response:
To get rid of the [[
you can also use pluck()
from the purrr
package:
library(tidyverse)
s_new <- s1 %>% filter(col %in% pluck(pluck(l, 1), 1) & car_mod %in% pluck(pluck(l, 1), 2))