Home > Back-end >  Iteratively skip the last rows in CSV files when using read_csv
Iteratively skip the last rows in CSV files when using read_csv

Time:12-16

I have a number of CSV files exported from our database, say site1_observations.csv, site2_observations.csv, site3_observations.csv etc. Each CSV looks like below (site1 for example):

Column A Column B Column C
# Team: all teams
# Observation type: xyz
Site ID Reason Quantity
a xyz 1
b abc 2
Total quantity 3

We need to skip the top 2 rows and the last 1 row from each CSV before combining them as a whole master dataset for further analysis. I know I can use the skip = argument to skip the first few lines of CSV, but read_csv() doesn't seem to have simple argument to skip the last lines and I have been using n_max = as a workaround. The data import has been done in manual way. I want to shift the manual process to programmatic manner using purrr::map(), but just couldn't work out how to efficiently skip the last few lines here.

library(tidyverse)

observations_skip_head <- 2

# Approach 1: manual ----
site1_rawdata <- read_csv("/data/site1_observations.csv", 
                          skip = observations_skip_head, 
                          n_max = nrow(read_csv("/data/site1_observations.csv", 
                                                skip = observations_skip_head))-1)

# site2_rawdata
# site3_rawdata
# [etc]

# all_sites_rawdata <- bind_rows(site1_rawdata, site2_rawdata, site3_rawdata, [etc])

I have tried to use purrr::map() and I believe I am almost there, except the n_max = part which I am not sure how/what to do this in map() (or any other effective way to get rid of the last line in each CSV). How to do this with purrr?

observations_csv_paths_chr <- paste0("data/site", 1:3,"_observations.csv")

# Approach 2: programmatically import csv files with purrr ----

all_sites_rawdata <- observations_csv_paths_chr %>% 
  
  map(~ read_csv(., skip = observations_skip_head, 
                 n_max = nrow(read_csv("/data/site1_observations.csv", 
                                       skip = observations_skip_head))-1)) %>% 
  set_names(observations_csv_paths_chr)

I know this post uses a custom function and fread. But for my education I want to understand how to achieve this goal using the purrr approach (if it's doable).

CodePudding user response:

You could try something like this?

library(tidyverse)

csv_files <- paste0("data/site", 1:3, "_observations.csv")

csv_files |>
  map(
    ~ .x |> 
      read_lines() |> 
      tail(-3) |>   # skip first 3
      head(-2) |>   # ..and last 2
      read_csv()
  )

CodePudding user response:

manual_csv<-function(x) { 
  txt<-readLines(x)
  txt<-txt[-c(2,3,length(txt))] # insert the row you want to delete
  result<-read.csv(text=paste0(txt, collapse="\n"))
}
test<-manual_csv('D:/jaechang/pool/final.csv')
  • Related