I have a number of CSV files exported from our database, say site1_observations.csv
, site2_observations.csv
, site3_observations.csv
etc. Each CSV looks like below (site1 for example):
Column A | Column B | Column C |
---|---|---|
# Team: all teams | ||
# Observation type: xyz | ||
Site ID | Reason | Quantity |
a | xyz | 1 |
b | abc | 2 |
Total quantity | 3 |
We need to skip the top 2 rows and the last 1 row from each CSV before combining them as a whole master dataset for further analysis. I know I can use the skip =
argument to skip the first few lines of CSV, but read_csv()
doesn't seem to have simple argument to skip the last lines and I have been using n_max =
as a workaround. The data import has been done in manual way. I want to shift the manual process to programmatic manner using purrr::map()
, but just couldn't work out how to efficiently skip the last few lines here.
library(tidyverse)
observations_skip_head <- 2
# Approach 1: manual ----
site1_rawdata <- read_csv("/data/site1_observations.csv",
skip = observations_skip_head,
n_max = nrow(read_csv("/data/site1_observations.csv",
skip = observations_skip_head))-1)
# site2_rawdata
# site3_rawdata
# [etc]
# all_sites_rawdata <- bind_rows(site1_rawdata, site2_rawdata, site3_rawdata, [etc])
I have tried to use purrr::map()
and I believe I am almost there, except the n_max =
part which I am not sure how/what to do this in map()
(or any other effective way to get rid of the last line in each CSV). How to do this with purrr?
observations_csv_paths_chr <- paste0("data/site", 1:3,"_observations.csv")
# Approach 2: programmatically import csv files with purrr ----
all_sites_rawdata <- observations_csv_paths_chr %>%
map(~ read_csv(., skip = observations_skip_head,
n_max = nrow(read_csv("/data/site1_observations.csv",
skip = observations_skip_head))-1)) %>%
set_names(observations_csv_paths_chr)
I know this post uses a custom function and fread
. But for my education I want to understand how to achieve this goal using the purrr
approach (if it's doable).
CodePudding user response:
You could try something like this?
library(tidyverse)
csv_files <- paste0("data/site", 1:3, "_observations.csv")
csv_files |>
map(
~ .x |>
read_lines() |>
tail(-3) |> # skip first 3
head(-2) |> # ..and last 2
read_csv()
)
CodePudding user response:
manual_csv<-function(x) {
txt<-readLines(x)
txt<-txt[-c(2,3,length(txt))] # insert the row you want to delete
result<-read.csv(text=paste0(txt, collapse="\n"))
}
test<-manual_csv('D:/jaechang/pool/final.csv')