Performing multiple operations on multiple data.tables-CodePudding

I have 30 tables I created. Their names are structured as follows:
mdl_(race)_(wage quartile).
(race) is one of the following: whites, blacks, hispanics, asians, others, or all.
(wage quartile) is one of the following: Q1, Q2, Q3, Q4, and allQ.
Since I have 6 race categories and 5 wage quartiles, I have 6*5 = 30 objects!

Ex: Linear model that includes only hispanics in the 1st quartile of wage distribution => mdl_hispanics_Q1
Ex: Linear model that includes all races and all wage quartiles => mdl_all_allQ

All tables are formatted identically, with different values of course:

          Variables     Estimate   Std. Error    t value      Pr(>|t|)
 1:       Intercept 37.231178895 9.486380e-02 392.469814  0.000000e 00
 2:         forborn -0.612941167 5.174224e-02 -11.846051  2.300944e-32
 3:          female -3.238655089 4.797890e-02 -67.501655  0.000000e 00
 4:        numchild  0.583390602 2.239027e-02  26.055543 1.841656e-149
 5: numchild_female  0.371351058 9.086739e-02   4.086736  4.376191e-05
 6:              hs  0.173864095 9.180975e-02   1.893743  5.826025e-02
 7:         somecol  0.595612050 9.407851e-02   6.331011  2.439689e-10
 8:         college  1.593917949 9.929766e-02  16.051918  5.923264e-58
 9:        advanced  0.171443556 1.983952e-03  86.415175  0.000000e 00
10:              rw -0.001207904 1.460021e-05 -82.731964  0.000000e 00
11:      rw_squared -0.954029880 3.252520e-02 -29.332024 8.456547e-189

What I want to do is get a numeric vector with 30 values, where each value is the estimate for the variable "forborn" if its statistically significant Pr(>|t|) < 0.1 and zero otherwise. I am a beginner to R, and only know how to do this table by table. This is painfully tedious and takes up so much code. Is there a way I could take advantage of the fact the tables are named similarly and loop this operation in one sweep?

CodePudding user response：

You can try mget to iterate over the data frames, then fetch the data from them with sapply.

EDIT, changed the data frame names to match your description.

ls()
#[1] "mdl_hispanics_..."  "mdl_blacks_..." etc.

as.vector( sapply( mget( 
  grep("mdl_.*[whites|blacks|hispanics|asians|others|all]", 
  ls(), value=T) ), function(x) 
  ifelse( x[x$Variables == "forborn","Pr(>|t|)"] < 0.1,
          x[x$Variables == "forborn","Pr(>|t|)"], 0) ) )
#[1] 2.300944e-32 2.300944e-32 0.000000e 00

CodePudding user response：

Write a function to extract the column Estimate conditional on the p-value and lapply it to the list.

library(data.table)

fextrac <- function(x){
  y <- x[, Estimate := ifelse(`Pr(>|t|)` < 0.1, Estimate, 0)][["Estimate"]]
  y[x$Variables == "forborn"]
}

Estimates_list <- sapply(dt_list, fextrac)
Estimates_list
#[1] -0.6129412 -0.6129412

Test data

dt1 <- read.table(text = "
         Variables     Estimate   'Std. Error'    't value'      'Pr(>|t|)'
 1:       Intercept 37.231178895 9.486380e-02 392.469814  0.000000e 00
 2:         forborn -0.612941167 5.174224e-02 -11.846051  2.300944e-32
 3:          female -3.238655089 4.797890e-02 -67.501655  0.000000e 00
 4:        numchild  0.583390602 2.239027e-02  26.055543 1.841656e-149
 5: numchild_female  0.371351058 9.086739e-02   4.086736  4.376191e-05
 6:              hs  0.173864095 9.180975e-02   1.893743  5.826025e-02
 7:         somecol  0.595612050 9.407851e-02   6.331011  2.439689e-10
 8:         college  1.593917949 9.929766e-02  16.051918  5.923264e-58
 9:        advanced  0.171443556 1.983952e-03  86.415175  0.000000e 00
10:              rw -0.001207904 1.460021e-05 -82.731964  0.000000e 00
11:      rw_squared -0.954029880 3.252520e-02 -29.332024 8.456547e-189
", header = TRUE, check.names = FALSE)

set.seed(2021)
dt2 <- dt1
dt2$`Pr(>|t|)`[sample(nrow(dt2), nrow(dt2)/3)] <- 0.1

setDT(dt1)
setDT(dt2)
dt_list <- list(dt1, dt2)

CodePudding user response：

This might be considered a better way, and it returns a vector of the Estimate for forborn if p-value<0.1, or 0 [not the p-value itself]

rbindlist(lapply(ls(pattern="mdl_"),get))[
  Variables=="forborn",fifelse(`Pr(>|t|)`<0.1,Estimate,0)
  ]

Note: just adjust the pattern param in ls() if you need further specificity on the objects