Home > Software design >  Correcting the output of a custom function which measures the True Positive Rate of a variable selec
Correcting the output of a custom function which measures the True Positive Rate of a variable selec

Time:01-04

NOTE: This question is the 3rd and final question on here all trying to solve the same complicated problem, but asked in steps. The previous question is here and the initial question is here. Furthermore, the code used in this question all comes from the "LASSO_code(10)" Rscript, which only uses data found in the folder called "ten" up on my GitHub Repository for this Statistical Learning research project.

The main improvement I have right off the bat in the current state of my code over where I left off at the end of the previous question is that now, after loading the list of dataframes into the workspace and assigning them to datasets using:

datasets <- lapply(filepaths_list, read.csv)

I also figured out (finally!) how to correct the column names for each of the dataframes in that list by executing:

# change column names of all the columns in the dataframe 'datasets'
datasets <- lapply(datasets, function(dataset_i) { 
  colnames(dataset_i) <- c("Y", "X1", "X2", "X3", "X4", "X5", "X6", "X7", 
                           "X8", "X9", "X10", "X11", "X12", "X13", "X14", 
                           "X15", "X16", "X17", "X18", "X19", "X20", "X21", 
                           "X22", "X23", "X24", "X25", "X26", "X27", "X28", 
                           "X29", "X30")
  dataset_i })

Beyond that, in order to re-create the multiple dataset/dataframe version of the 'True_IVs' object I created back during the original version of this question, I have also added in the following two commands:

Structural_IVs_chr <- lapply(datasets, function(j) {j[1, -1]})    
> Structural_IVs_chr[[1]]
  X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20
1  0  0  0  0  0  0  0  0  0   0   1   0   1   0   0   1   0   0   0   0
  X21 X22 X23 X24 X25 X26 X27 X28 X29 X30
1   0   0   0   0   0   0   0   0   0   0
Structural_IVs_num <- lapply(Structural_IVs_chr, \(X) { lapply(X, as.numeric) })

Plus, I have also gotten around the need to create the True_IVs object used in the most recent version of my code I asked about in the last post by utilizing my new Structural_IVs_chr object in the following manner:

True_Regressors <- lapply(Structural_IVs_chr, function(i) {
  names(i)[i == 1] })
> head(True_Regressors, n = 3)
[[1]]
[1] "X11" "X13" "X16"    
[[2]]
[1] "X6"  "X7"  "X20"
[[3]]
[1] "X9"  "X10" "X20"

And once again, the same is in the previous posts, I am writing a function which counts how many elements in the prior output equal elements in the selected variables chosen by a LASSO Regression on those datasets stored in the following format:

> head(IVs_Selected_by_LASSO, n = 3)
[[1]]
[1] "X11" "X16"    
[[2]]
[1] "X6"  "X7"  "X20"    
[[3]]
[1] "X9"  "X10" "X20"

And finally, here are my attempts to write the performance measuring function I am asking help for here:

### Count up how many Variables Selected match  the true 
### structural equation variables for that dataset in order
### to measure LASSO's performance.
Total_Positives <- lapply(True_Regressors, function(i) { length(i) })
> head(Total_Positives, n = 3)
[[1]]
[1] 3    
[[2]]
[1] 3    
[[3]]
[1] 3

True_Pos_list1 <- lapply(seq_along(datasets), \(i)
                        length(intersect(IVs_Selected_by_LASSO, 
                                         True_Regressors)) )
> head(True_Pos_list1, n = 3)
[[1]]
[1] 9
[[2]]
[1] 9
[[3]]
[1] 9

True_Pos_list2 <- lapply(seq_along(datasets), \(i)
                         sum(IVs_Selected_by_LASSO %in% True_Regressors))
> head(True_Pos_list2, n = 3)
[[1]]
[1] 9
[[2]]
[1] 9
[[3]]
[1] 9

As ought to be clear, the results should be:

> head(True_Pos_list, n = 3)
    [[1]]
    [1] 2
    [[2]]
    [1] 3
    [[3]]
    [1] 3

And furthermore, the result returned for any element in the True_Pos_list should never be larger than the corresponding element in the Total_Positives list!

CodePudding user response:

Try running:

True_Pos_list <- lapply(seq_along(datasets), \(i)
                                    sum(IVs_Selected_by_LASSO[[i]] %in% 
                                          True_Regressors[[i]]))

instead, see how that works. I am not sure why you erroneously got 9 back for the number of True Positives, but this should fix the output, and that seems to be your most urgent concern here.

  • Related