Home > Blockchain >  Obtaining a vector with sapply and use it to remove rows from dataframes in a list with lapply
Obtaining a vector with sapply and use it to remove rows from dataframes in a list with lapply

Time:01-14

I have a list with dataframes:

df1 <- data.frame(id = seq(1:10), name = LETTERS[1:10])
df2 <- data.frame(id = seq(11:20), name = LETTERS[11:20])
mylist <- list(df1, df2)

I want to remove rows from each dataframe in the list based on a condition (in this case, the value stored in column id). I create an empty vector where I will store the ids:

ids_to_remove <- c() 

Then I apply my function:

sapply(mylist, function(df) {
  
  rows_above_th <- df[(df$id > 8),] # select the rows from each df above a threshold
  a <- rows_above_th$id # obtain the ids of the rows above the threshold 
  ids_to_remove <- append(ids_to_remove, a) # append each id to the vector
  
},

simplify = T

) 

However, with or without simplify = T, this returns a matrix, while my desired output (ids_to_remove) would be a vector containing the ids, like this:

ids_to_remove <- c(9,10,9,10)

Because lastly I would use it in this way on single dataframes:

for(i in 1:length(ids_to_remove)){

                  mylist[[1]] <- mylist[[1]] %>%
                    filter(!id == ids_to_remove[i])

                }

And like this on the whole list (which is not working and I don´t get why):

i = 1
lapply(mylist, 
       function(df) {
         
                for(i in 1:length(ids_to_remove)){
                  df <- df %>%
                    filter(!id == ids_to_remove[i])
                           
                  i = i   1
         
                }
} )
      

I get the errors may be in the append part of the sapply and maybe in the indexing of the lapply. I played around a bit but couldn´t still find the errors (or a better way to do this).

CodePudding user response:

If you are using sapply/lapply you want to avoid trying to change the values of global variables. Instead, you should return the values you want. For example generate a vector if IDs to remove for each item in the list as a list

ids_to_remove <- lapply(mylist, function(df) {
  rows_above_th <- df[(df$id > 8),] # select the rows from each df above a threshold
  rows_above_th$id # obtain the ids of the rows above the threshold
}) 

And then you can use that list with your data list and mapply to iterate the two lists together

mapply(function(data, ids) {
  data %>% dplyr::filter(!id %in% ids)
}, mylist, ids_to_remove, SIMPLIFY=FALSE)

CodePudding user response:

Using base R

 Map(\(x, y) subset(x, !id %in% y), mylist, ids_to_remove)
  • Related