I have a folder called "HUCs" with 2100 files, each named with an acronym and a number. There are three columns of data in each file, identical in length and with identical headers. I have another file called "means" with 2100 rows
. Each row
corresponds to one file
in the HUCs folder, and is named accordingly.
For example, row A2 in "means" contains "1010001" (its name), and B2 contains the means data. The "1010001" row data corresponds to a file in the HUCs folder called "mricgcm_1010001".
I need to somehow copy each row from the "means" file one at a time and place the data (three cells in the row) into its corresponding file in a new range of columns. So, the current data in the HUCs file are in columns A/B/C and the new "means" data can be placed into D/E/F (or some delimited equivalent).
If there were only the one folder of 2100 files, I'd consider doing it by hand. But I have several similar folders. I can't even wrap my head around how to go about this. Is this possible in R? I tried my best, but please ask if this needs further explanation.
This is the beginning of the list of HUC files:
mricgcm_1010002.csv
mricgcm_1010003.csv
mricgcm_1030001.csv
mricgcm_1040001.csv
mricgcm_1040002.csv
mricgcm_1050001.csv
mricgcm_1050002.csv
mricgcm_1070002.csv
mricgcm_1080107.csv
mricgcm_1080204.csv
mricgcm_1090003.csv
mricgcm_1100001.csv
mricgcm_1100002.csv
mricgcm_1100003.csv
mricgcm_1200001.csv
This is the worksheet in the "means" file:
HUC Means
1010002 141.0727273
1010003 148.6072727
1030001 158.3327273
1040001 181.1127273
1040002 222.78
1050001 163.4818182
1050002 231.9272727
1070002 201.5018182
1080107 183.0545455
1080204 197.3836364
1090003 209.14
1100001 193.0472727
1100002 186.1963636
1100003 199.2854545
1200001 1.021818182
This is what the HUCs files contain now:
HUC Year Predicted
mricgcm_1010002 1961 10.8
mricgcm_1010002 1962 4.6
mricgcm_1010002 1963 0.8
mricgcm_1010002 1964 66.3
mricgcm_1010002 1965 20.4
mricgcm_1010002 1966 8.5
mricgcm_1010002 1967 13.5
mricgcm_1010002 1968 3.7
mricgcm_1010002 1969 59.2
mricgcm_1010002 1970 13.8
mricgcm_1010002 1971 5.5
mricgcm_1010002 1972 1.9
mricgcm_1010002 1973 9.9
mricgcm_1010002 1974 11.7
mricgcm_1010002 1975 52.4
And this is and example what I would like as a result, for each file:
HUC Year Predicted HUC Mean
mricgcm_1010002 1961 10.8 1010002 141.0727273
mricgcm_1010002 1962 4.6 141.0727273
mricgcm_1010002 1963 0.8 141.0727273
mricgcm_1010002 1964 66.3 141.0727273
mricgcm_1010002 1965 20.4 141.0727273
mricgcm_1010002 1966 8.5 141.0727273
mricgcm_1010002 1967 13.5 141.0727273
mricgcm_1010002 1968 3.7 141.0727273
mricgcm_1010002 1969 59.2 141.0727273
mricgcm_1010002 1970 13.8 141.0727273
mricgcm_1010002 1971 5.5 141.0727273
mricgcm_1010002 1972 1.9 141.0727273
mricgcm_1010002 1973 9.9 141.0727273
mricgcm_1010002 1974 11.7 141.0727273
mricgcm_1010002 1975 52.4 141.0727273
Note, FTR: The mean is NOT the mean of the column of predicted data or else I could just run a mean formula on that column. It is the mean of the historic data, gotten from elsewhere.
This is what I am getting now:
Designation HUC YEAR RO_MM HIST HUC Historic Mean D
mricgcm 1010004 1961 71.8
mricgcm 1010004 1962 119
mricgcm 1010004 1963 177.7
mricgcm 1010004 1964 166
mricgcm 1010004 1965 159
mricgcm 1010004 1966 180.3
So, something is happening, but no data.
CodePudding user response:
Here is my solution. The structure of directory is
.
├── copy_means_to_FUCs.R
├── HUCs
│ ├── mricgcm_1010001.csv
│ └── mricgcm_1010002.csv
└── means.csv
Then the code is organized as
library(data.table)
# install.packages("qpcR")
means <- fread("means.csv")
means$HUC <- as.character(means$HUC)
# get all file's name
fn <- list.files("./HUCs",pattern="*.csv",full.names = TRUE)
f_paste <- function(x){
csv <- fread(x)
num <- gsub(".*_(.*).csv","\\1",x)
tmp <- cbind(csv,means[num == HUC,]) # match row by name then cbind them
#tmp <- qpcR:::cbind.na(csv, means[num == HUC,])
fwrite(tmp,x)
invisible(return(NULL))
}
invisible(lapply(fn, f_paste)) # Attention: run once is enough
The imperfect place is that the row from means
is filled in all rows due to cbind
. I think it's not the core problem to be solved.
( You can replce cbind
with qpcR:::cbind.na(csv, means[num == HUC,])
to solve this problem)