I am trying to read in multiple text files with read_delim
. However, these text files differ in how many columns they have. I am only interested in some of the columns which are common in all text files.
However, when I try to specify the columns with col_select
, it still throws the error that the amount of columns are different. Here is a minimal example:
> df = read_delim(c('file1.txt', 'file2.txt'), col_select = 1)
Error: Files must all have 3 columns:
* File 2 has 2 columns
However, this works and only reads in the first column:
> df = read_delim('file1.txt', col_select = 1)
New names:
• `test2` -> `test2...2`
• `test2` -> `test2...3`
Rows: 1 Columns: 1
── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
dbl (1): test1
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Content of file1.txt:
test1 test2 test3
1 2 3
Content of file2.txt:
test1 test2
1 2
Does anyone have any ideas how to read in text files which differ in the number of columns that they have?
CodePudding user response:
As it seems to check the number of columns are equal and will error before column selection happens, you likely need to read each in separately and bind them:
library(readr)
library(purrr)
set_names(c('file1.txt', 'file2.txt')) %>%
map(read_delim, col_select = 1, show_col_types = FALSE) %>%
list_rbind(names_to = "file_id")
# A tibble: 2 × 2
file_id test1
<chr> <dbl>
1 file1.txt 1
2 file2.txt 1