R: Deleting rows from a data frame based on values of other vector-CodePudding

So I have a data frame with baskets of products of purchases of individuals. A row stands for a basket of products of one individual. I want to remove all the rows (baskets) that contain a product (expressed as a integer) that are listed in a vector named products.to.delete . Here is a small image of how the data set looks like.

Next to that I have a vector containing a large number of numbers that must be deleted. I would like to delete all the rows that contain a value from this vector.

here is some code to make it reproducable:

    dataframe <- as.data.frame( matrix(data = sample(10000,1000,replace = TRUE),20,50))
 products.to.delete <- sample(10000,200,replace = FALSE)

Thank you in advance for helping me out!

CodePudding user response：

If your data is data, and your vector of target values is vals, you could do this:

data[apply(data,1,\(r) !any(r %in% vals)),]

That is, within each row of data (i.e. apply(data,1...)), you can check if any of the values are in vals. Reverse the boolean using !, to create an global logical vector for selecting the remaining rows

CodePudding user response：

For your next questions, please create reproducible examples such as the one below.

What you're after is called filtering and can be done in base R by the following.

First, create an object called for example myfilter which is a boolean vector with the same length as the number of rows in your data.frame.

mydat <- data.frame("col1"=1:5, "col2"=letters[1:5])
  col1 col2
1    1    a
2    2    b
3    3    c
4    4    d
5    5    e


myfilter <- mydat$col2 %in% c("a", "c")
[1]  TRUE FALSE  TRUE FALSE FALSE
mydat[myfilter,]
col1 col2
1    1    a
3    3    c

Then simply include this object into brackets []. R will keep rows where values are TRUE