i know similar questions have been asked multiple times, but somehow I am stuck very near the solution. Doing some research did not bring me closer. It is possible that, as a beginner in R, i'm asking the wrong questions/searching the wrong keywords. In that case, you are welcome to help.
I have a folder containing multiple .csv files with ";" as delimiter. These files have two columns forming the id. All tables have a different size, e.g. the number of rows varies as does the number of columns.
As an example the data looks like this:
name | extension | data1 | data2 | data n |
---|---|---|---|---|
paul | ch | 7 | 9 | n1 |
sue | nc | 0 | 8 | n2 |
sue | ch | 6 | 8 | n3 |
etc. |
The data has the following logic: data1 includes all data from 1 sample (sample name: data1) and includes the number of occurencies of names, either nc or ch.
My aim is now to bring together all these tables into a combined data frame.
First I was trying iterating with a for function, but this gave poor results. Later I came accross a tidyverse approach as follows:
files <- dir(path = "~/Documents/data", pattern = "*.csv")
data <- files %>%
map(read_csv2(paste("~/Documents/data", files, sep = "/"))) %>%
reduce(full_join)
data
This starts working, but fails with the error:
0sError: Files must all have 31 columns: 0s
* File 2 has 34 columns
Here is where I'm stuck. I think the problem occurs when the dataframes are compiled into a list. there it seems to me, that it is a necessity for the dataframes to be of the same extensions. In the examples I found on stack and elsewhere, the example data has always the same length.
Can anybody help me find the mistake?
Best wishes
CodePudding user response:
map
syntax is map(list, function)
. read_csv2(paste("~/Documents/data", files, sep = "/")
isn't a function, it's trying to run read_csv2
on all the files, but read_csv2
isn't vecrorized. Change to purrr
style lambda syntax as below. I'd also strongly recommend specifying the by
columns in your join, especially a full join, to make sure you're getting what you think you are and the result doesn't blow up in size.
files %>%
map(~read_csv2(paste("~Documents/data", ., sep = "/")) %>%
reduce(full_join, by = c("name", "extension"))
If you need to debug more, give yourself a small example and do it one line at a time. Say, start with files2 <- files[1:2]
and just work on reading in the first two files. When that works, move on to the join.