Home > Back-end >  Remove the last x% of rows in each df, in a list of dfs, dependent upon a numeric vector
Remove the last x% of rows in each df, in a list of dfs, dependent upon a numeric vector

Time:10-01

I have a list of dataframes where each dataframe contains the same number of rows. I have a numeric vector (removal) that specifies how much I want to remove from each dataframe (e.g., remove the last 25% from the first df, df1).

I can keep the first x% of rows from each df by creating a function and using lapply, but I don't know how to incorporate that code into a loop that would loop through removal. Any help is appreciated.

library(dplyr)
df1 <- data.frame(var = sample(1:5, 100, replace = TRUE))
df2 <- data.frame(var = sample(1:5, 100, replace = TRUE))
df3 <- data.frame(var = sample(1:5, 100, replace = TRUE))
lst <- list(df1, df2, df3)

head(df1)
#>   var
#> 1   3
#> 2   2
#> 3   2
#> 4   4
#> 5   5
#> 6   2

removal <- c(.25, .30, .50)
# I want to remove the last 25% of rows from df1,
# the last 30% of rows from df2,
# and the last 50% of rows from df3

# I can only make it so I can keep a static percentage from each df
# from the top of each df
# but I don't know to incorporate this code with a loop
# that would loop through `removal`
fake_removal <- .75

fake_remv <- function(x){
  x <- x %>% filter(row_number() < nrow(x) * fake_removal) 
  return(x)
  }

badlst <- lapply(lst, fake_remv)
print(nrow(badlst[[1]]))
#> [1] 74
head(badlst[[1]])
#>   var
#> 1   3
#> 2   2
#> 3   2
#> 4   4
#> 5   5
#> 6   2

CodePudding user response:

We may need Map/mapply in base R or map2 from tidyverse

library(dplyr)
library(purrr)
map2(lst, removal, ~ .x %>% 
        filter(row_number() < nrow(.x) * .y))

The equivalent Map option would be

Map(function(dat, rml) subset(dat, seq_len(nrow(dat)) < nrow(dat) * rml),
         lst, removal)
  • Related