Home > Net >  Merging CSV files of the same names from different folders into one file
Merging CSV files of the same names from different folders into one file

Time:06-01

I have 14 years of precipitation data for different meteo stations (more than 400) in the structure as follows for years 2008-2021:

/2008/Meteo_2008-01/249180090.csv

/2008/Meteo_2008-02/249180090.csv

/2008/Meteo_2008-03/249180090.csv ... and so on for the rest of the months.

/2009/Meteo_2009-01/249180090.csv

/2009/Meteo_2009-02/249180090.csv

/2009/Meteo_2009-03/249180090.csv ... and so on for the rest of the months.

I have a structure like that until 2021. 249180090.csv - that stands for the station code, as I wrote above I have more than 400 stations.

In the CSV file, there are data on daily precipitation for desired rainfall station.

I would like to create one CSV file for EVERY STATION for every year from 2088 to 2021, which will contain merged information from January until December on the precipitation. The name of CSV file should contain the station number.

Would someone be kind and help me how can I do that in a loop? My goal is not to create just a one file out of all data, but a separate CSV file for every meteo station. On the forum, I have found a question, which was solving relatively similar problem but merging all data just into one file, without sub-division into separate files.

CodePudding user response:

The problem can be split into parts:

  1. Identify all files in all subfolders of the working directory by using list.files(..., recursive = TRUE).
  2. Keep only the csv files
  3. Import them all into r - for example, by mapping read.csv to all paths
  4. Joining everything into a single dataframe, for example with reduce and bind_rows (assuming that all csvs have the same structure)
  5. Split this single dataframes according to station code, for example with group_split()
  6. Writing these split dataframes to csv, for example by mapping write.csv.

This way you can avoid using for loops.

library(here)
library(stringr)
library(purrr)
library(dplyr)

# Identify all files
filenames <- list.files(here(), recursive = TRUE, full.names = TRUE)

# Limit to csv files
joined <- filenames[str_detect(filenames, ".csv")] |> 
  # Read them all in
  map(read.csv) |> 
  # Join them in
  reduce(bind_rows)

# Split into one dataframe per station
split_df <- joined |> group_split(station_code)

# Save each dataframe to a separate csv
map(seq_along(split_df), function(i) {
  write.csv(split_df[[i]],
            paste0(split_df[[i]][1,1], "_combined.csv"),
            row.names = FALSE)
})
  •  Tags:  
  • r
  • Related