How to calculate the average of the same column (with same name) in 100s different csv files with pa-CodePudding

I have a bunch of csv files that are structured like this:

df <- data.frame (first_column  = c(3, 2, 6, 7),
                  second_column = c(7, 5, 1, 8))

All the csv files have a name like

"type1_1.csv"
"type1_2.csv"
...
"type2_1.csv"
"type2_2.csv"
...

Each of these csv's have first_column and second_column. What I want is to create a new dataframe that looks like this:

# name        meanofsecond_column
# type1_1     5.25
# ...

What I started doing already, was to individually write out each:

type1_1 <- read_csv("type1_1.csv")
type1_1mean <- mean(type1_1$second_column)
...
df <- data.frame (name  = c(type1_1, type1_2...),
                  meanofsecondcolumn = c(type1_1mean, type1_2mean...))

However, since there are more than 100 csv files, this method is not very efficient or clean. How can I make it more condensed?

CodePudding user response：

# path where your csv files are (here current working directory)
CSV_FOLDER <- "."

# list all csv files in given directory
# second parameter is a regex meaning ends with .csv
# third parameter make function return file names with path
csv_files <- list.files(CSV_FOLDER, "\\.csv$", full.names=TRUE)

# apply given function on each file and collect results in a list
res <- lapply(csv_files, function(csv_file) {
  # read current file
  tmp <- read.csv(csv_file)

  # build a data.frame from filename (without path) and mean of second column
  return(data.frame(
    name = basename(csv_file),
    meanofsecondcolumn = mean(tmp$second_column)
  ))
})

# rbind all single line data.frames in a single data.frame
res <- do.call(rbind(res))

CodePudding user response：

Using map

library(purrr)
library(dplyr)
library(stringr)
files <- list.files(path = "/path/to/folder", pattern = "\\.csv$", 
      full.names = TRUE)
nm <- str_remove(basename(files), "\\.csv$")
map_dfr(setNames(files, nm), ~ read.csv(.x) %>%
          summarise(secondcolumn = mean(second_column,
             na.rm = TRUE)), .id = "name")