full join of multiple csv files in R-CodePudding

i know similar questions have been asked multiple times, but somehow I am stuck very near the solution. Doing some research did not bring me closer. It is possible that, as a beginner in R, i'm asking the wrong questions/searching the wrong keywords. In that case, you are welcome to help.

I have a folder containing multiple .csv files with ";" as delimiter. These files have two columns forming the id. All tables have a different size, e.g. the number of rows varies as does the number of columns.

As an example the data looks like this:

name	extension	data1	data2	data n
paul	ch	7	9	n1
sue	nc	0	8	n2
sue	ch	6	8	n3
etc.

The data has the following logic: data1 includes all data from 1 sample (sample name: data1) and includes the number of occurencies of names, either nc or ch.

My aim is now to bring together all these tables into a combined data frame.

First I was trying iterating with a for function, but this gave poor results. Later I came accross a tidyverse approach as follows:

files <- dir(path = "~/Documents/data", pattern = "*.csv")

data <- files %>%
  map(read_csv2(paste("~/Documents/data", files, sep = "/"))) %>%
  reduce(full_join)
data

This starts working, but fails with the error:

0sError: Files must all have 31 columns:                                             0s
* File 2 has 34 columns

Here is where I'm stuck. I think the problem occurs when the dataframes are compiled into a list. there it seems to me, that it is a necessity for the dataframes to be of the same extensions. In the examples I found on stack and elsewhere, the example data has always the same length.

Can anybody help me find the mistake?

Best wishes

CodePudding user response：

map syntax is map(list, function). read_csv2(paste("~/Documents/data", files, sep = "/") isn't a function, it's trying to run read_csv2 on all the files, but read_csv2 isn't vecrorized. Change to purrr style lambda syntax as below. I'd also strongly recommend specifying the by columns in your join, especially a full join, to make sure you're getting what you think you are and the result doesn't blow up in size.

files %>% 
  map(~read_csv2(paste("~Documents/data", ., sep = "/")) %>%
  reduce(full_join, by = c("name", "extension"))

If you need to debug more, give yourself a small example and do it one line at a time. Say, start with files2 <- files[1:2] and just work on reading in the first two files. When that works, move on to the join.