I have a bunch of csv files that are structured like this:
df <- data.frame (first_column = c(3, 2, 6, 7),
second_column = c(7, 5, 1, 8))
All the csv files have a name like
"type1_1.csv"
"type1_2.csv"
...
"type2_1.csv"
"type2_2.csv"
...
Each of these csv's have first_column
and second_column
. What I want is to create a new dataframe that looks like this:
# name meanofsecond_column
# type1_1 5.25
# ...
What I started doing already, was to individually write out each:
type1_1 <- read_csv("type1_1.csv")
type1_1mean <- mean(type1_1$second_column)
...
df <- data.frame (name = c(type1_1, type1_2...),
meanofsecondcolumn = c(type1_1mean, type1_2mean...))
However, since there are more than 100 csv files, this method is not very efficient or clean. How can I make it more condensed?
CodePudding user response:
# path where your csv files are (here current working directory)
CSV_FOLDER <- "."
# list all csv files in given directory
# second parameter is a regex meaning ends with .csv
# third parameter make function return file names with path
csv_files <- list.files(CSV_FOLDER, "\\.csv$", full.names=TRUE)
# apply given function on each file and collect results in a list
res <- lapply(csv_files, function(csv_file) {
# read current file
tmp <- read.csv(csv_file)
# build a data.frame from filename (without path) and mean of second column
return(data.frame(
name = basename(csv_file),
meanofsecondcolumn = mean(tmp$second_column)
))
})
# rbind all single line data.frames in a single data.frame
res <- do.call(rbind(res))
CodePudding user response:
Using map
library(purrr)
library(dplyr)
library(stringr)
files <- list.files(path = "/path/to/folder", pattern = "\\.csv$",
full.names = TRUE)
nm <- str_remove(basename(files), "\\.csv$")
map_dfr(setNames(files, nm), ~ read.csv(.x) %>%
summarise(secondcolumn = mean(second_column,
na.rm = TRUE)), .id = "name")