Home > Enterprise >  2 consecutive nested for-loops - Is it possible to optimize with data.table or apply? Added data.tab
2 consecutive nested for-loops - Is it possible to optimize with data.table or apply? Added data.tab

Time:11-20

I have a data.table made of data.tables as per the dput at the end of this question. I manipulate this data.table of data.tables using the following nested for-loops:

test_E2 <- list()
for (i in unique(lst_512_32_E2$ID)){
     test_E2[[i]] <- list()
     for (j in 1:length(lst_512_32_E2$V1[[i]])){
          test_E2[[i]][[j]] <- sapply(lst_512_32_E2[ID==i]$V1, '[[', j)
     }
}

t_test_E2 <- list()
for (i in 1:length(test_E2)){
     t_test_E2[[i]] <- list()
     for (j in 1:length(test_E2[[i]])){
          t_test_E2[[i]][[j]] <- t(test_E2[[i]][[j]])
     }
}

Any chance these for-loops could be re-generated/optimized while staying in the data.table world? What about an apply/mapply function as a second alternative? Minding that I want the final output as matrix.

Edit (revised code):

#This function will be used to delete null list elements
delete.NULLs  <-  function(x.list){
     x.list[unlist(lapply(x.list, length) != 0)]
}

#I revised the nested loop to accommodate more than one variable, i.e., V1, V2, V3, etc.
test_E2 <- list()
for (i in unique(lst_512_32_E2$ID)){
     test_E2[[i]] <- list()
     for (j in 2:length(lst_512_32_E2)){
          test_E2[[i]][[j]] <- (sapply(lst_512_32_E2[ID==i,..j][[1]], '[[',1))
     }
}

#This is where the NULL list elements are deleted.
test_E2 <- lapply(test_E2, delete.NULLs)

#This is the same. Could be eliminated using Karl's answer though
t_test_E2 <- list()
for (i in 1:length(test_E2)){
     t_test_E2[[i]] <- list()
     for (j in 1:length(test_E2[[i]])){
          t_test_E2[[i]][[j]] <- t(test_E2[[i]][[j]])
     }
}

I refer you to this question which was of help before. Maybe it brings up some ideas: Optimizing a foreach with an embeded lapply loop - Is it possible to optimize code?

Edit 2: Trying to play with data.table syntax

#Test_E22 and Test_E222 could be chained but they're kept separate for readability.
test_E22 <- lst_512_32_E2[,.(.(lapply(.SD, function(x) .(sapply(x, '[[',1))))),by = ID]
test_E222 <- test_E22[,.(lapply(.SD, function(x) matrix(unlist(x),nrow = window_size))), by=ID]

#Turning the data.table into a list of data.tables
abc <- lapply(unique(test_E222$ID), function(x) test_E222[ID==x][,c(.SD), by=ID])

#Combine Values into a Vector or List. Can the V1 inside the c() be replaced with a function and still return a vector or list? What if if I want to return more than one vector or list in the case I have V1, V2, etc.?
abc2 <- lapply(abc, function(x) x[,c(V1)])

#This is a first attempt at it to keep the community engaged with the question. Hopefully, someone is able to optimize it further and can help resolve the c() issue.

dput:

print(dput(lst_512_32_E2[1:2]))
structure(list(ID = c(1L, 1L), gl = structure(1:2, levels = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", 
"25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35", 
"36", "37", "38", "39", "40", "41", "42", "43", "44", "45", "46", 
"47", "48", "49", "50", "51", "52", "53", "54", "55", "56", "57", 
"58", "59", "60", "61", "62", "63", "64", "65", "66", "67", "68", 
"69", "70", "71", "72", "73", "74", "75", "76", "77", "78", "79", 
"80", "81", "82", "83", "84", "85", "86", "87", "88", "89", "90", 
"91", "92", "93", "94", "95", "96", "97", "98", "99", "100", 
"101", "102", "103", "104", "105", "106", "107", "108", "109", 
"110", "111", "112", "113", "114", "115", "116", "117", "118", 
"119", "120", "121", "122", "123", "124", "125", "126", "127", 
"128", "129", "130", "131", "132", "133", "134", "135", "136", 
"137", "138", "139", "140", "141", "142", "143", "144", "145", 
"146", "147", "148", "149", "150", "151", "152", "153", "154", 
"155", "156", "157", "158", "159", "160", "161", "162", "163", 
"164", "165", "166", "167", "168", "169", "170", "171", "172", 
"173", "174", "175", "176", "177", "178", "179", "180", "181", 
"182", "183", "184", "185", "186", "187", "188", "189", "190", 
"191", "192", "193", "194", "195", "196", "197", "198", "199", 
"200", "201", "202", "203", "204", "205", "206", "207", "208", 
"209", "210", "211", "212", "213", "214", "215", "216", "217", 
"218", "219", "220", "221", "222", "223", "224", "225", "226", 
"227", "228", "229", "230", "231", "232", "233", "234", "235", 
"236", "237", "238", "239", "240", "241", "242", "243", "244", 
"245", "246", "247", "248", "249", "250", "251", "252", "253", 
"254", "255", "256", "257", "258", "259", "260", "261", "262", 
"263", "264", "265", "266", "267", "268", "269", "270", "271", 
"272", "273", "274", "275", "276", "277", "278", "279", "280", 
"281", "282", "283", "284", "285", "286", "287", "288", "289", 
"290", "291", "292", "293", "294", "295", "296", "297", "298", 
"299", "300", "301", "302", "303", "304", "305", "306", "307", 
"308", "309", "310", "311", "312", "313", "314", "315", "316", 
"317", "318", "319", "320", "321", "322", "323", "324", "325", 
"326", "327", "328", "329", "330", "331", "332", "333", "334", 
"335", "336", "337", "338", "339", "340", "341", "342", "343", 
"344", "345", "346", "347", "348", "349", "350", "351", "352", 
"353", "354", "355", "356", "357", "358", "359", "360", "361", 
"362", "363", "364", "365", "366", "367", "368", "369", "370", 
"371", "372", "373", "374", "375", "376", "377", "378", "379", 
"380", "381", "382", "383", "384", "385", "386", "387", "388", 
"389", "390", "391", "392", "393", "394", "395", "396", "397", 
"398", "399", "400", "401", "402", "403", "404", "405", "406", 
"407", "408", "409", "410", "411", "412", "413", "414", "415", 
"416", "417", "418", "419", "420", "421", "422", "423", "424", 
"425", "426", "427", "428", "429", "430", "431", "432", "433", 
"434", "435", "436", "437", "438", "439", "440", "441", "442", 
"443", "444", "445", "446", "447", "448", "449", "450", "451", 
"452", "453", "454", "455", "456", "457", "458", "459", "460", 
"461", "462", "463", "464", "465", "466", "467", "468", "469", 
"470", "471", "472", "473", "474", "475", "476", "477", "478", 
"479", "480", "481", "482", "483", "484", "485", "486", "487", 
"488", "489", "490", "491", "492", "493", "494", "495", "496", 
"497", "498", "499", "500", "501", "502", "503", "504", "505", 
"506", "507", "508", "509", "510", "511", "512", "513", "514", 
"515", "516", "517", "518", "519", "520", "521", "522", "523", 
"524", "525", "526", "527", "528", "529", "530", "531", "532", 
"533", "534", "535", "536", "537", "538", "539", "540", "541", 
"542", "543", "544", "545", "546", "547", "548", "549", "550", 
"551", "552", "553", "554", "555", "556", "557", "558", "559", 
"560", "561", "562", "563", "564", "565", "566", "567", "568", 
"569", "570", "571", "572", "573", "574", "575", "576", "577", 
"578", "579", "580", "581", "582", "583", "584", "585", "586", 
"587", "588", "589", "590", "591", "592", "593", "594", "595", 
"596", "597", "598", "599", "600", "601", "602", "603", "604", 
"605", "606", "607", "608", "609", "610", "611", "612", "613", 
"614", "615", "616", "617", "618", "619", "620", "621", "622", 
"623", "624", "625", "626", "627", "628", "629", "630", "631", 
"632", "633", "634", "635", "636", "637", "638", "639", "640"
), class = "factor"), V1 = list(structure(list(V1 = c(-0.049, 
-0.042, 0.015, -0.051, -0.107, -0.078, -0.02, -0.046, -0.063, 
0.068, 0.095, -0.007, -0.046, 0.044, 0.137, 0.098, 0.081, -0.073, 
-0.037, 0.012, -0.037, -0.044, 0.015, 0.044, -0.029, -0.09, -0.061, 
-0.042, -0.002, 0.007, 0.024, -0.005, -0.11, -0.076, 0.032, 0.088, 
-0.005, -0.105, -0.117, -0.071, -0.002, -0.017, -0.034, -0.098, 
-0.071, -0.056, -0.083, -0.093, -0.012, 0.002, 0.042, -0.056, 
-0.017, 0.007, -0.015, 0.02, 0.015, 0.007, 0.029, 0.054, 0.01, 
-0.007, -0.056, -0.049, -0.034, 0.002, 0.017, -0.071, -0.103, 
-0.093, -0.051, -0.01, -0.107, -0.063, 0.054, 0.007, 0.037, 0.071, 
0.107, -0.02, -0.056, -0.078, 0.027, 0.063, -0.051, -0.115, -0.068, 
-0.059, -0.024, -0.044, 0.027, -0.012, -0.054, -0.02, 0.022, 
-0.066, -0.037, 0.117, 0.071, 0.029, 0.015, -0.032, 0.027, -0.044, 
-0.22, -0.2, -0.024, 0.007, -0.129, -0.068, 0.044, 0.059, 0.012, 
0.002, -0.068, 0.029, 0.117, 0.039, 0.005, 0.088, 0.032, -0.095, 
-0.076, -0.032, -0.059, -0.142, -0.164, -0.071, -0.02, -0.032, 
-0.088, -0.022, 0.032, 0.032, 0.007, -0.022, -0.042, 0.024, 0.042, 
-0.017, -0.034, 0.01, 0.002, -0.076, -0.078, -0.054, -0.095, 
-0.073, -0.034, -0.103, -0.081, -0.088, -0.017, -0.049, 0.012, 
-0.09, -0.122, 0.01, 0.022, 0.122, 0.107, 0.012, -0.017, -0.107, 
-0.107, 0.034, -0.034, -0.044, -0.061, -0.115, -0.132, -0.193, 
-0.029, 0.078, 0.093, 0.1, 0.049, -0.037, 0.029, -0.027, 0.002, 
0.081, -0.024, -0.083, -0.046, -0.002, -0.037, -0.149, -0.02, 
0.01, -0.049, -0.105, -0.051, 0.078, 0.071, 0.007, -0.081, 0.054, 
0.164, 0.042, 0.073, -0.02, -0.032, 0.015, 0.002, -0.081, 0.042, 
0.024, -0.132, -0.063, 0.051, 0.02, 0, 0.02, -0.01, -0.005, 0.071, 
0.01, -0.005, 0.088, 0.037, -0.015, -0.042, -0.024, -0.012, 0.071, 
-0.022, -0.1, -0.115, -0.029, -0.01, -0.002, -0.051, -0.081, 
0.027, 0.11, 0.022, -0.061, 0.061, 0.01, -0.012, -0.02, -0.049, 
0.029, 0.01, -0.029, -0.032, 0.01, 0.042, -0.01, 0.042, 0.034, 
-0.088, -0.083, -0.09, 0.037, -0.002, 0.056, 0.024, 0.044, 0.154, 
0.088, 0.027, 0.034, 0.105, 0.081, -0.02, -0.083, -0.068, -0.017, 
0.034, 0.042, -0.073, -0.112, -0.015, 0.088, 0.071, -0.066, -0.085, 
0.083, 0.156, 0.105, -0.073, -0.071, 0.09, 0.078, -0.051, -0.142, 
-0.076, 0.005, -0.01, -0.093, -0.076, -0.049, 0.056, 0.01, -0.046, 
0.042, 0.132, 0.049, -0.029, 0.044, 0.107, 0.122, 0.068, -0.002, 
-0.078, -0.012, -0.037, -0.105, -0.115, 0.017, 0.042, 0.015, 
0.032, 0.054, 0.024, -0.002, 0.083, 0.061, -0.007, 0.056, 0.046, 
-0.01, 0.049, 0.022, -0.024, -0.024, -0.022, -0.127, -0.176, 
-0.081, -0.068, 0, 0.015, -0.029, -0.017, -0.027, -0.002, 0.054, 
0.005, -0.022, -0.027, -0.007, 0.095, 0.029, -0.085, -0.059, 
-0.063, 0.024, 0.029, -0.063, -0.078, -0.127, -0.068, -0.022, 
-0.029, 0.046, 0.029, 0.01, 0.039, 0.132, 0.068, 0.044, 0.012, 
-0.029, -0.015, 0.093, -0.01, -0.134, -0.115, -0.066, -0.032, 
0.002, -0.039, -0.134, -0.051, 0.034, 0.061, 0.066, 0.061, 0.066, 
0.01, 0.024, 0.093, 0.044, 0.037, 0.012, 0.002, -0.027, -0.11, 
-0.11, -0.073, -0.029, 0.032, 0.005, -0.066, -0.005, -0.02, -0.029, 
-0.068, -0.01, 0.071, 0.081, 0.034, -0.037, -0.032, -0.007, -0.012, 
-0.073, -0.088, -0.071, -0.049, -0.083, -0.044, -0.112, 0.015, 
-0.1, -0.154, 0.029, 0.073, 0.073, 0, -0.01, 0.005, -0.012, -0.103, 
-0.12, -0.093, -0.042, -0.024, -0.154, -0.073, -0.054, -0.1, 
-0.125, -0.117, -0.066, 0.034, 0.085, 0.012, 0.039, 0.085, 0.005, 
-0.022, -0.017, 0.02, 0.039, -0.046, -0.007, 0.012, -0.012, -0.063, 
-0.054, 0.007, -0.056, -0.107, 0.037, 0.093, 0.046, -0.061, -0.015, 
0.039, 0.024, 0.068, 0.007, -0.027, 0.051, -0.134, -0.11, 0.007, 
-0.093, -0.105, -0.056, -0.076, 0.012, -0.071, -0.056, -0.117, 
-0.073, 0.002, 0.054, 0.078, 0.09, 0.11, 0.09, -0.022, -0.044, 
0.042, 0.073, -0.005, 0.015, 0.017, -0.085, -0.1, -0.085, -0.059, 
-0.103, -0.071, -0.056, -0.034, 0.032, 0.039, -0.007, -0.007, 
0.068, 0.027, -0.054, -0.078, -0.061, -0.059, -0.024)), row.names = c(NA, 
-512L), class = c("data.table", "data.frame")), structure(list(
    V1 = c(-0.11, -0.076, 0.032, 0.088, -0.005, -0.105, -0.117, 
    -0.071, -0.002, -0.017, -0.034, -0.098, -0.071, -0.056, -0.083, 
    -0.093, -0.012, 0.002, 0.042, -0.056, -0.017, 0.007, -0.015, 
    0.02, 0.015, 0.007, 0.029, 0.054, 0.01, -0.007, -0.056, -0.049, 
    -0.034, 0.002, 0.017, -0.071, -0.103, -0.093, -0.051, -0.01, 
    -0.107, -0.063, 0.054, 0.007, 0.037, 0.071, 0.107, -0.02, 
    -0.056, -0.078, 0.027, 0.063, -0.051, -0.115, -0.068, -0.059, 
    -0.024, -0.044, 0.027, -0.012, -0.054, -0.02, 0.022, -0.066, 
    -0.037, 0.117, 0.071, 0.029, 0.015, -0.032, 0.027, -0.044, 
    -0.22, -0.2, -0.024, 0.007, -0.129, -0.068, 0.044, 0.059, 
    0.012, 0.002, -0.068, 0.029, 0.117, 0.039, 0.005, 0.088, 
    0.032, -0.095, -0.076, -0.032, -0.059, -0.142, -0.164, -0.071, 
    -0.02, -0.032, -0.088, -0.022, 0.032, 0.032, 0.007, -0.022, 
    -0.042, 0.024, 0.042, -0.017, -0.034, 0.01, 0.002, -0.076, 
    -0.078, -0.054, -0.095, -0.073, -0.034, -0.103, -0.081, -0.088, 
    -0.017, -0.049, 0.012, -0.09, -0.122, 0.01, 0.022, 0.122, 
    0.107, 0.012, -0.017, -0.107, -0.107, 0.034, -0.034, -0.044, 
    -0.061, -0.115, -0.132, -0.193, -0.029, 0.078, 0.093, 0.1, 
    0.049, -0.037, 0.029, -0.027, 0.002, 0.081, -0.024, -0.083, 
    -0.046, -0.002, -0.037, -0.149, -0.02, 0.01, -0.049, -0.105, 
    -0.051, 0.078, 0.071, 0.007, -0.081, 0.054, 0.164, 0.042, 
    0.073, -0.02, -0.032, 0.015, 0.002, -0.081, 0.042, 0.024, 
    -0.132, -0.063, 0.051, 0.02, 0, 0.02, -0.01, -0.005, 0.071, 
    0.01, -0.005, 0.088, 0.037, -0.015, -0.042, -0.024, -0.012, 
    0.071, -0.022, -0.1, -0.115, -0.029, -0.01, -0.002, -0.051, 
    -0.081, 0.027, 0.11, 0.022, -0.061, 0.061, 0.01, -0.012, 
    -0.02, -0.049, 0.029, 0.01, -0.029, -0.032, 0.01, 0.042, 
    -0.01, 0.042, 0.034, -0.088, -0.083, -0.09, 0.037, -0.002, 
    0.056, 0.024, 0.044, 0.154, 0.088, 0.027, 0.034, 0.105, 0.081, 
    -0.02, -0.083, -0.068, -0.017, 0.034, 0.042, -0.073, -0.112, 
    -0.015, 0.088, 0.071, -0.066, -0.085, 0.083, 0.156, 0.105, 
    -0.073, -0.071, 0.09, 0.078, -0.051, -0.142, -0.076, 0.005, 
    -0.01, -0.093, -0.076, -0.049, 0.056, 0.01, -0.046, 0.042, 
    0.132, 0.049, -0.029, 0.044, 0.107, 0.122, 0.068, -0.002, 
    -0.078, -0.012, -0.037, -0.105, -0.115, 0.017, 0.042, 0.015, 
    0.032, 0.054, 0.024, -0.002, 0.083, 0.061, -0.007, 0.056, 
    0.046, -0.01, 0.049, 0.022, -0.024, -0.024, -0.022, -0.127, 
    -0.176, -0.081, -0.068, 0, 0.015, -0.029, -0.017, -0.027, 
    -0.002, 0.054, 0.005, -0.022, -0.027, -0.007, 0.095, 0.029, 
    -0.085, -0.059, -0.063, 0.024, 0.029, -0.063, -0.078, -0.127, 
    -0.068, -0.022, -0.029, 0.046, 0.029, 0.01, 0.039, 0.132, 
    0.068, 0.044, 0.012, -0.029, -0.015, 0.093, -0.01, -0.134, 
    -0.115, -0.066, -0.032, 0.002, -0.039, -0.134, -0.051, 0.034, 
    0.061, 0.066, 0.061, 0.066, 0.01, 0.024, 0.093, 0.044, 0.037, 
    0.012, 0.002, -0.027, -0.11, -0.11, -0.073, -0.029, 0.032, 
    0.005, -0.066, -0.005, -0.02, -0.029, -0.068, -0.01, 0.071, 
    0.081, 0.034, -0.037, -0.032, -0.007, -0.012, -0.073, -0.088, 
    -0.071, -0.049, -0.083, -0.044, -0.112, 0.015, -0.1, -0.154, 
    0.029, 0.073, 0.073, 0, -0.01, 0.005, -0.012, -0.103, -0.12, 
    -0.093, -0.042, -0.024, -0.154, -0.073, -0.054, -0.1, -0.125, 
    -0.117, -0.066, 0.034, 0.085, 0.012, 0.039, 0.085, 0.005, 
    -0.022, -0.017, 0.02, 0.039, -0.046, -0.007, 0.012, -0.012, 
    -0.063, -0.054, 0.007, -0.056, -0.107, 0.037, 0.093, 0.046, 
    -0.061, -0.015, 0.039, 0.024, 0.068, 0.007, -0.027, 0.051, 
    -0.134, -0.11, 0.007, -0.093, -0.105, -0.056, -0.076, 0.012, 
    -0.071, -0.056, -0.117, -0.073, 0.002, 0.054, 0.078, 0.09, 
    0.11, 0.09, -0.022, -0.044, 0.042, 0.073, -0.005, 0.015, 
    0.017, -0.085, -0.1, -0.085, -0.059, -0.103, -0.071, -0.056, 
    -0.034, 0.032, 0.039, -0.007, -0.007, 0.068, 0.027, -0.054, 
    -0.078, -0.061, -0.059, -0.024, 0.037, -0.007, -0.083, -0.032, 
    -0.061, -0.081, -0.093, -0.117, 0.034, 0.044, 0.037, 0.054, 
    0.083, 0.002, -0.103, 0.083, 0.115, -0.139, -0.046, 0.142, 
    0.032, -0.139, -0.151, 0.081, 0.107, -0.061, -0.076, 0.005, 
    0.176, 0.078, -0.061, 0.01)), row.names = c(NA, -512L), class = c("data.table", 
"data.frame")))), row.names = c(NA, -2L), class = c("data.table", 
"data.frame"), .internal.selfref = <pointer: 0x000002289534be80>)
   ID gl                  V1
1:  1  1 <data.table[512x1]>
2:  1  2 <data.table[512x1]>

CodePudding user response:

Thank you for the clarifications! It's much more clear now how to optimize.

Assumptions

  • test_E2 and t_test_E2 are each a list of lists

  • test_E2 gathers results by ID, since the data for a particular ID may be dispersed throughout the data.table

  • t_test_E2 transposes each element of test_E2

Optimization

  1. Instead of having two sets of nested loops, combine the processing into a single nested loop. In other words, do the transposition of each element as it is generated in test_E2 by changing

test_E2[[i]][[j]] <- sapply(lst_512_32_E2[ID==i]$V1, '[[', j)

to

test_E2[[i]][[j]] <- t( sapply(lst_512_32_E2[ID==i]$V1, '[[', j) )

  1. Since sapply is already optimized, the way to gain the most in performance will be to write a C function that performs test_E2[[i]][[j]] <- t( sapply(lst_512_32_E2[ID==i]$V1, '[[', j) )

  2. Another way to speed things up would be to use parallel processing, i.e., using multiple processors.

I hope this helps!

------------------

Data.Table Syntax

The article, https://www.infoworld.com/article/3575086/the-ultimate-r-datatable-cheat-sheet.html, Shows Data.Table Code and the equivalent Tidyverse Code (where applicable) for a variety of data wrangling and summarization tasks.

Of particular interest for your situation:

Alter data.table in place without making a copy

any function that starts with set such as setkey(mydt, mycol) or using the := operator within brackets

Count number of rows by groups

mydt2 <- mydt[,.N, by = groupcol] #for one group < OR >

mydt2 <- mydt[, .N, by = .(groupcol1, groupcol2)]

Summarize multiple columns and return results in multiple columns

mydt2 <- mydt[, lapply(.SD, myfun), .SDcols = c("colA", "colB")]

Summarize multiple columns by group and return results in multiple columns

mydt2 <- mydt[, lapply(.SD, myfun), .SDcols = c("colA", "colB"), by = groupcol]

CodePudding user response:

You can try split lapply to generate lists test_E2, and perform t over entries in test_E2 in turn

test_E2 <- with(
  lst_512_32_E2,
  lapply(
    split(V1, ID),
    function(x) unname(as.matrix(do.call(cbind, x)))
  )
)

t_test_E2 <- lapply(test_E2, t)
  • Related