Get the number of rows and columns of a multiple CSV file-CodePudding

Is there any way to get information about the number of rows and columns of a multiple CSV file in R and save it in a CSV file? Here is my R code:

#Library
if (!require("tidyverse")) install.packages("tidyverse")
if (!require("fs")) install.packages("fs")

#Mentioning Files Location
file_paths <- fs::dir_ls("C:\\Users\\Desktop\\FileCount\\Test")
file_paths[[2]]


#Reading Multiple CSV Files
file_paths %>%
  map(function(path)
  {
     read_csv(path,col_names = FALSE)
  })

#Counting Number of Rows
lapply(X = file_paths, FUN = function(x) {
  length(count.fields(x))
})

#Counting Number of Columns
lapply(X = file_paths, FUN = function(x) {
  length(ncol(x))
})

#Saving CSV File
write.csv(file_paths,"C:\\Users\\Desktop\\FileCount\\Test\\FileName.csv", row.names = FALSE)

Couple of things are not working:

Number of Columns of a multiple CSV file
When I am saving the file, I want to save Filename, number of rows and number of columns. See attached image.

How the output looks like:

How the output looks like

Attached some CSV Files for testing: Here

Any help appreciated.

CodePudding user response：

Welcome on SO! Using the tidyverse and data.table, here's a way to do it:

Note: All the .csv files are in my TestStack directory, but you can change it with your own directory (C:/Users/Desktop/FileCount/Test).

Code:

library(tidyverse)
csv.file <- list.files("TestStack") # Directory with your .csv files
data.frame.output <- data.frame(number_of_cols = NA,
                                number_of_rows = NA,
                                name_of_csv = NA) #The df to be written


MyF <- function(x){
  
  csv.read.file <- data.table::fread(
    paste("TestStack", x, sep = "/")
    )
  
  number.of.cols <- ncol(csv.read.file)
  
  number.of.rows <- nrow(csv.read.file)
  
  data.frame.output <<- add_row(data.frame.output,
                                number_of_cols = number.of.cols,
                                number_of_rows = number.of.rows,
                                name_of_csv = str_remove_all(x,".csv")) %>% 
    filter(!is.na(name_of_csv))
  
}

map(csv.file, MyF)

Output:

  number_of_cols number_of_rows name_of_csv
1              3           2150      CH_com
2              2          34968 epci_com_20
3              3            732        g1g4
4              7         161905          RP

I have this output because my TestStack had 4 files named CH_com.csv, epci_com_20.csv,...

You can then write the object data.frame.output to a .csv as you wanted: data.table::fwrite(data.frame.output, file = "Output.csv")

CodePudding user response：

files_map <- "test"
files <- list.files(files_map)

library(data.table)

output <- data.table::rbindlist(
  lapply(files, function(file) {
    dt <- data.table::fread(paste(files_map, file, sep = "/"))
    list("number_of_cols" = ncol(dt), "number_of_rows" = nrow(dt), "name_of_csv" = file)
  })
)

data.table::fwrite(output, file = "Filename.csv")