I have a dataset with more than 200k lines, and being updated constantly. So, I need to read last 3000 rows of this dataset. Reading 200k lines and filtering dataframe for 3000 rows is time consuming. Instead, I want to directly read last 3000 rows. Is there a way to achieve this?
Thanks in advance.
CodePudding user response:
This would be my approach. First read in first column of each dataset and store the number of rows (nrow
) in a list. Using map2
we can read in the data while simultaneously use no_rows to get a number to skip while reading in.
library(readxl)
library(purrr)
files <- list.files(pattern = "*.xlsx")
no_rows <- map(files, ~nrow(readxl::read_excel(.x, range = cellranger::cell_cols(1))))
# read in last three thousands rows
map2(files, no_rows, ~readxl::read_excel(.x, skip = .y - 3000 ))