So I have a data frame with baskets of products of purchases of individuals. A row stands for a basket of products of one individual. I want to remove all the rows (baskets) that contain a product (expressed as a integer) that are listed in a vector named products.to.delete . Here is a small image of how the data set looks like.
Next to that I have a vector containing a large number of numbers that must be deleted. I would like to delete all the rows that contain a value from this vector.
here is some code to make it reproducable:
dataframe <- as.data.frame( matrix(data = sample(10000,1000,replace = TRUE),20,50))
products.to.delete <- sample(10000,200,replace = FALSE)
Thank you in advance for helping me out!
CodePudding user response:
If your data is data
, and your vector of target values is vals
, you could do this:
data[apply(data,1,\(r) !any(r %in% vals)),]
That is, within each row of data
(i.e. apply(data,1...)
), you can check if any of the values are in vals
. Reverse the boolean using !
, to create an global logical vector for selecting the remaining rows
CodePudding user response:
For your next questions, please create reproducible examples such as the one below.
What you're after is called filtering and can be done in base R by the following.
First, create an object called for example myfilter
which is a boolean vector with the same length as the number of rows in your data.frame
.
mydat <- data.frame("col1"=1:5, "col2"=letters[1:5])
col1 col2
1 1 a
2 2 b
3 3 c
4 4 d
5 5 e
myfilter <- mydat$col2 %in% c("a", "c")
[1] TRUE FALSE TRUE FALSE FALSE
mydat[myfilter,]
col1 col2
1 1 a
3 3 c
Then simply include this object into brackets []
. R will keep rows where values are TRUE