Home > Enterprise >  In R - one row of data to be added to multiple files, iteratively
In R - one row of data to be added to multiple files, iteratively

Time:09-22

I have a folder called "HUCs" with 2100 files, each named with an acronym and a number. There are three columns of data in each file, identical in length and with identical headers. I have another file called "means" with 2100 rows. Each row corresponds to one file in the HUCs folder, and is named accordingly.

For example, row A2 in "means" contains "1010001" (its name), and B2 contains the means data. The "1010001" row data corresponds to a file in the HUCs folder called "mricgcm_1010001".

I need to somehow copy each row from the "means" file one at a time and place the data (three cells in the row) into its corresponding file in a new range of columns. So, the current data in the HUCs file are in columns A/B/C and the new "means" data can be placed into D/E/F (or some delimited equivalent).

If there were only the one folder of 2100 files, I'd consider doing it by hand. But I have several similar folders. I can't even wrap my head around how to go about this. Is this possible in R? I tried my best, but please ask if this needs further explanation.

This is the beginning of the list of HUC files:

mricgcm_1010002.csv
mricgcm_1010003.csv
mricgcm_1030001.csv
mricgcm_1040001.csv
mricgcm_1040002.csv
mricgcm_1050001.csv
mricgcm_1050002.csv
mricgcm_1070002.csv
mricgcm_1080107.csv
mricgcm_1080204.csv
mricgcm_1090003.csv
mricgcm_1100001.csv
mricgcm_1100002.csv
mricgcm_1100003.csv
mricgcm_1200001.csv

This is the worksheet in the "means" file:

HUC     Means
1010002 141.0727273
1010003 148.6072727
1030001 158.3327273
1040001 181.1127273
1040002 222.78
1050001 163.4818182
1050002 231.9272727
1070002 201.5018182
1080107 183.0545455
1080204 197.3836364
1090003 209.14
1100001 193.0472727
1100002 186.1963636
1100003 199.2854545
1200001 1.021818182

This is what the HUCs files contain now:

HUC             Year    Predicted
mricgcm_1010002 1961    10.8
mricgcm_1010002 1962    4.6
mricgcm_1010002 1963    0.8
mricgcm_1010002 1964    66.3
mricgcm_1010002 1965    20.4
mricgcm_1010002 1966    8.5
mricgcm_1010002 1967    13.5
mricgcm_1010002 1968    3.7
mricgcm_1010002 1969    59.2
mricgcm_1010002 1970    13.8
mricgcm_1010002 1971    5.5
mricgcm_1010002 1972    1.9
mricgcm_1010002 1973    9.9
mricgcm_1010002 1974    11.7
mricgcm_1010002 1975    52.4

And this is and example what I would like as a result, for each file:

HUC             Year    Predicted HUC       Mean
mricgcm_1010002 1961    10.8      1010002   141.0727273
mricgcm_1010002 1962    4.6                 141.0727273
mricgcm_1010002 1963    0.8                 141.0727273
mricgcm_1010002 1964    66.3                141.0727273
mricgcm_1010002 1965    20.4                141.0727273
mricgcm_1010002 1966    8.5                 141.0727273
mricgcm_1010002 1967    13.5                141.0727273
mricgcm_1010002 1968    3.7                 141.0727273
mricgcm_1010002 1969    59.2                141.0727273
mricgcm_1010002 1970    13.8                141.0727273
mricgcm_1010002 1971    5.5                 141.0727273
mricgcm_1010002 1972    1.9                 141.0727273
mricgcm_1010002 1973    9.9                 141.0727273
mricgcm_1010002 1974    11.7                141.0727273
mricgcm_1010002 1975    52.4                141.0727273

Note, FTR: The mean is NOT the mean of the column of predicted data or else I could just run a mean formula on that column. It is the mean of the historic data, gotten from elsewhere.

This is what I am getting now:

Designation HUC YEAR    RO_MM   HIST    HUC Historic Mean   D
mricgcm 1010004 1961    71.8                
mricgcm 1010004 1962    119             
mricgcm 1010004 1963    177.7               
mricgcm 1010004 1964    166             
mricgcm 1010004 1965    159             
mricgcm 1010004 1966    180.3

So, something is happening, but no data.

CodePudding user response:

Here is my solution. The structure of directory is

.
├── copy_means_to_FUCs.R
├── HUCs
│  ├── mricgcm_1010001.csv
│  └── mricgcm_1010002.csv
└── means.csv

Then the code is organized as

library(data.table)
# install.packages("qpcR") 

means <- fread("means.csv")
means$HUC <- as.character(means$HUC)

# get all file's name
fn <- list.files("./HUCs",pattern="*.csv",full.names = TRUE)

f_paste <- function(x){
  csv <- fread(x)
  num <- gsub(".*_(.*).csv","\\1",x)
  tmp <- cbind(csv,means[num == HUC,]) # match row by name then cbind them
  #tmp <- qpcR:::cbind.na(csv, means[num == HUC,])
  fwrite(tmp,x)
  invisible(return(NULL)) 
}

invisible(lapply(fn, f_paste)) # Attention: run once is enough

The imperfect place is that the row from means is filled in all rows due to cbind. I think it's not the core problem to be solved. ( You can replce cbind with qpcR:::cbind.na(csv, means[num == HUC,]) to solve this problem)

  • Related