Home > Enterprise >  How can I concatenate the writing of 2 lists (same length) in R to a single csv file such that it ha
How can I concatenate the writing of 2 lists (same length) in R to a single csv file such that it ha

Time:08-27

I have a file folder with 47,000 csv files containing datasets, but for this question, I'll use my practice code & the results from a different folder I created by copying & pasting just the top 15 csv file formatted datasets into a new folder. I am running a LASSO Regression on each dataset and my desired final result is a single new csv file

After installing/loading in the necessary libraries, my first lines of code are the following:

filepath_list <- list.files(path = "~/DAEN_698/other datasets/sample_obs2", 
                            full.names = FALSE, recursive = TRUE)
print(filepath_list)
> print(filepath_list)
 [1] "0.4-3-1-1.csv"  "0.4-3-1-10.csv" "0.4-3-1-11.csv" "0.4-3-1-12.csv"
 [5] "0.4-3-1-13.csv" "0.4-3-1-14.csv" "0.4-3-1-15.csv" "0.4-3-1-2.csv" 
 [9] "0.4-3-1-3.csv"  "0.4-3-1-4.csv"  "0.4-3-1-5.csv"  "0.4-3-1-6.csv" 
[13] "0.4-3-1-7.csv"  "0.4-3-1-8.csv"  "0.4-3-1-9.csv" 

The order of these is wrong, but I will ask about how to resort them correctly in a separate question post here later. From the above, I wrote the following commands with the following results:

# reformat the names of each of the csv file formatted datasets
DS_names_list <- basename(filepath_list)
DS_names_list <- tools::file_path_sans_ext(DS_names_list)
head(DS_names_list, n = 5)
> head(DS_names_list, n = 5)
[1] "0.4-3-1-1"  "0.4-3-1-10" "0.4-3-1-11" "0.4-3-1-12" "0.4-3-1-13"
tail(DS_name_list, n = 5)
> tail(DS_name_list, n = 5)
[1] "0.4-3-1-5" "0.4-3-1-6" "0.4-3-1-7" "0.4-3-1-8" "0.4-3-1-9"

From there, I also wrote lines which ran LASSO Regressions on each dataset and I'll include that code as well at the very bottom in the "p.s." section, but the only thing about that relevant here is that they work and what their names are. I called part of LASSO_fits & the extraction of just their coefficient estimates LASSO_Coeffs. I used the following line to extract just the names which of the 30 candidate Independent Variables the LASSO selected for each dataset:

IVs_selected_by_LASSO <- lapply(LASSO_Coeffs, function(i) names(i[i > 0]))
head(IVs_selected_by_LASSO, n = 2)
> head(IVs_selected_by_LASSO, n = 2)
[[1]]
[1] "X2" "X3"

[[2]]
[1] "X2" "X3"
IVs_selected_by_LASSO[[2]]
> IVs_selected_by_LASSO[[2]]
[1] "X2" "X3"

And finally, the last step I have to perform is writing both DS_names_list & IVs_selected_by_LASSO to the same new csv file in such a way that the it has the same number of rows/lines in it as both of them and are formatted in one of the 4 following ways: 1st option: n-n-n-n, Xn, Xn, Xn, etc. so two lines would be: 0.4-3-1-1, X1, X2 0.4-3-1-2, X2, X3, X5 2nd option: n-n-n-n, n, n, n, n, etc. so the same 2 example lines would now be: 0.4-3-1-1, 1, 2 0.4-3-1-2, 2, 3, 5 3rd option: n-n-n-n; Xn, Xn, Xn (same as 1st but with a semicolon) 4th option: n-n-n-n; n, n, n, n (same as 2nd but with a semicolon)

I have already managed to be able to create two individual csv files with these desired formats, using the following two commands:

write.csv(DS_names_list, file = 'list_of_dataset_names.csv')

cat(sapply(IVs_selected_by_LASSO, toString), 
    sep ="\n", file = 'IVs_selected_by_LASSO.csv')

Both of them do their job, but what I really need is a way to concatenate or merge the contents of each line/row of each into one csv of the same length. Solutions or pointers would be greatly appreciated!

p.s. All of the code included here and the output running them generates can be found in the "Estimated-Exhaustive-Regression-Project" repository on my Github. Just in case these details may also be helpful, below are the code I used to create

LASSO_fits & LASSO_Coeffs:
set.seed(11)     # to ensure replicability
LASSO_fits <- lapply(datasets, function(i) 
  enet(x = as.matrix(select(i, starts_with("X"))), y = i$Y, 
       lambda = 0, normalize = FALSE))

set.seed(11)     # to ensure replicability
LASSO_Coeffs <- lapply(LASSO_fits, function(i) 
  predict(i, x = as.matrix(select(i, starts_with("X"))), s = 0.1, 
          mode = "fraction", type = "coefficients")[["coefficients"]])

CodePudding user response:

df <- data.frame(nam= DS_names_list, vars = sapply(IVs_selected_by_LASSO, toString))
write.csv2(df, file = "results.csv", row.names =FALSE)
  • Related