Home > Enterprise >  How can I run my script on several data directories
How can I run my script on several data directories

Time:11-03

I have create a R script that analyse and manipulate 2 different data frame extension, for exemple one task is to extract certain values from data and export it as a .txt file, here is part of my script and the data files that i use:

setwd('C:\\Users\\Zack\\Documents\\RScripts\\data\\data1')
heat_data="data1.heat"
time ="data1.timestamp"
ts_heat = read.table(heat_data)
ts_heat = ts_heat[-1,]
rownames(ts_heat) <- NULL
ts_time = read.table(time)
back_heat = subset(ts_heat, V3 == 'H')
back_time = ts_time$V1
library(data.table)
datDT[, newcol := fcoalesce(
nafill(fifelse(track == "H", back_time, NA_real_), type = "locf"),
0)]
last_heat = subset(ts_heat, V3 == 'H')
last_time = last_heat$newcol
x = back_time - last_heat
dataest = data.frame(back_time , x)
write_tsv(dataestimation, file="dataestimation.txt")

what i am looking for is to process my code to all my data files. For example here i am working in the path "C:\Users\Zack\Documents\RScripts\data\blabladata1" where data1 contain a .heat and . timestammp file. I want to process my script on blabladata2 (that contain also .heat and .timestamp), blabladata3 (that contain also .heat and .timestamp), blabladata4 ...etc So each file in the file: "data" contains this 2 .heat and .timestamp that i will use to export my dataestimation.txt. So in the end, each blabladata** should contain **.heat, **.timestamp, and dataestimation.txt that is filtered and calculated from the **.heat, **.timestamp files.

I don't know if this problem is treatable with R or should I change my script to an 'argument' script and execute it using command lines by applying it to 'path'/data/*/.heat 'path'/data/*/.timestamp

CodePudding user response:

Maybe something like the following will solve the problem.
Not tested at all, given the comment.

library(data.table)

processFile <- function(path, pattern, outfile, verbose = FALSE){
  fnames <- list.files(path = path, pattern = pattern)
  heat_data <- grep(".heat", fnames, value = TRUE)
  time <- grep(".timestamp", fnames, value = TRUE)
  #
  ts_heat <- read.table(heat_data)
  ts_heat <- ts_heat[-1, ]
  rownames(ts_heat) <- NULL
  ts_time <- read.table(time)
  back_heat <- subset(ts_heat, V3 == 'H')
  back_time <- ts_time$V1
  datDT[, newcol := fcoalesce(
    nafill(fifelse(track == "H", back_time, NA_real_), type = "locf"),
    0)]
  last_heat <- subset(ts_heat, V3 == 'H')
  last_time <- last_heat$newcol
  x <- back_time - last_heat
  dataestimation <- data.frame(back_time , x)
  out_filename <- file.path(path, outfile)
  write_tsv(dataestimation, file = out_filename)
}

processPath <- function(path, pattern = "data", outfile = "dataestimation.txt", verbose = FALSE){
  d <- list.dirs(path = path, full.names = TRUE)
  fl <- grep(pattern = pattern, x = d, value = TRUE)
  lapply(fl, processFile, pattern = pattern, outfile = outfile)
}

x <- 'C:\\Users\\Zack\\Documents\\RScripts\\data\\data1'
x <- chartr("\\", "/", x)
processPath(dirname(x))

CodePudding user response:

You can try

dirs <- list.dirs("/your/work/directory/", full.names = TRUE, recursive = FALSE)
# probably: 
# list.dirs("C:\\Users\\Zack\\Documents\\RScripts\\data\\", full.names = TRUE, recursive = FALSE)

for (i in 1:length(dirs)) {
  heat_data <- list.files(dirs[i], pattern = ".*.heat")
  time <- list.files(dirs[i], pattern = ".*.timestamp")
  
  ##
  ## your code
  ##

  write_tsv(dataestimation, file = paste(dirs[i], "\\dataestimation.txt", sep = ""))
}

this code reads into each folder of the path you pass it, so make sure that there are only

data_1
data_2
...
data_n

Inside each folder of the path it reads files with .heat and .timestamp extensions. At that point your code processes them and finally through write_tsv writes dataestimation.txt file in each folder.

So you will have

work_dir
├── data_1
│   ├── data1.heat
│   ├── data1.timestamp
│   └── dataestimation.txt
├── data_2
│   ├── data2.heat
│   ├── data2.timestamp
│   └── dataestimation.txt
└── data_n
    ├── datan.heat
    ├── datan.timestamp
    └── dataestimation.txt

I hope I understand what you wanted this time!

  •  Tags:  
  • r
  • Related