Home > Enterprise >  Add file names in a column when using fread
Add file names in a column when using fread

Time:10-04

I have several .txt files that I am importing with the following code:

files = list.files(pattern="*.txt")%>% 
     map_df(~fread(.))

Each file has several rows and columns, and I want to add an id row with the file name. So if the files were called A-ML201.txt and A-YH248, etc., I would get the following. The file names need to repeat since each file has multiple rows:

ID         col1    col2 
A-ML201     2       67
A-ML201     4       29
A-ML201     1       90
A-YH248     23      2
A-YH248     12      17
A-YH248     8       57

I have tried a few solutions from this thread: How to import multiple .csv files at once?. But keep getting errors, maybe because they are .txt files? I tried replacing read.csv with read.table. I am new to this kind of thing so any help is greatly apppreciated!

CodePudding user response:

We could pass a named vector and then use .id

library(purrr)
library(dplyr)
library(stringr)
library(data.table)
files <- list.files(path = "path/to/your/folder", 
    pattern="\\.txt", full.names = TRUE)
names(files) <- str_remove(basename(files), "\\.txt")
map_dfr(files, fread, .id = 'ID')

CodePudding user response:

Suppose we have the reproducible input files generated in the Note at the end. Then get the list of files using Sys.glob -- if these are the only .txt files you can use just "*.txt" -- and then run fread over each using Map and rbindlist to bind them together.

library(data.table)
library(tools) # this comes with R

"A-*.txt" |>
  Sys.glob() |>
  Map(f = fread) |>
  rbindlist(id = "ID") |>
  transform(ID = file_path_sans_ext(ID))

giving:

        ID col1 col2
1: A-ML201    2   67
2: A-ML201    4   29
3: A-ML201    1   90
4: A-YH248   23    2
5: A-YH248   12   17
6: A-YH248    8   57

The above code seems preferable since it only uses data.table plus tools which already comes with R but if you are ok with a mix of many -packages then this is how you would fix up the code in the question. Note that the regular expression there was incorrect.

library(data.table)
library(purrr)
library(tools)

"A-.*.txt" %>%
  list.files(pattern = .) %>% 
  set_names(., file_path_sans_ext(.)) %>%
  map_dfr(fread, .id = "ID")

  

Note

Lines1 <- "col1 col2
2   67
4   29
1   90"
Lines2 <- "col1 col2
23    2
12   17
8   57"
cat(Lines1, file = "A-ML201.txt")
cat(Lines2, file = "A-YH248.txt")
  •  Tags:  
  • r
  • Related