for loop to unlist(), select vector element & convert class - loops, list, vector, selection-CodePudding

I'm having problems in generalising a method of dealing with some 'problem data' - vectorised elements of a list. (I'm not entirely sure if I'm describing this accurately so please forgive my ignorance).

Objective:

I'm trying to compile a function that deals with the following problem automatically as the intention is to run / utilise this methodology repeatedly.

Problem:

I've pulled some sensor data from a db by a unit name (equipment the sensor is from), usually the sensor data comes as a vector (as per the desired outcome - below) but for certain units (where the data collectors are configured differently) it returns a list of vectors.

The number of elements within the vector relate to the number of units on site (i.e. if there are two units on site, /01 and /02, then the vector will contain 2 elements....and so on)

I want only the sensor data that pertains to that unit to be kept in the relevant column.

Not all units are configured in the same fashion, so I was only intending on running a a function when the class of the column has been confirmed that it is a list.

The number of sensors and their tags will be different each time (input into the main function).

Intended methodology:

Perform following checks and corrective measures over each column
Check the column class - check to see if class is a 'list'
If FALSE, do nothing.
If TRUE, perform following over each row with 'unlist_func'(sub function):
Check the end of the text string on each row of 'unit' column and extract unit_no
Unlist the current element and select the number in the vector that corresponds with the unit_no in the unit column
Ammend the column class to

Reproducible problem:

The following tibble is an example of what I'm working with:

unit <- c('a2b7/01', 'a2b7/02', 'a2b7/03', 'a2b7/01', 'a2b7/02', 'a2b7/03', 'a2b7/01', 'a2b7/02', 'a2b7/03')
sen1 <- list(c(1,2,3), c(4,5,6), c(7,8,9), c(10,11,12), c(13,14,15), c(16,17,18), c(19,20,21), c(22,23,24), c(25,26,27))
sen2 <- list(1, 2, 3, 4, 5, 6, 7, 8, 9)

sensor_data <- tibble(unit, sen1, sen2)

Desired Outcome:

unit <- c('a2b7/01', 'a2b7/02', 'a2b7/03', 'a2b7/01', 'a2b7/02', 'a2b7/03', 'a2b7/01', 'a2b7/02', 'a2b7/03')
sen1 <- c(1,5,9,10,14,19,23,27)
sen2 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)

sensor_data_new <- tibble(unit, sen1, sen2)

Attempt Thus far and Request:

So as not to turn up empty handed I've posted up a hasty version of my intended methodology but my apologies, it's a work in progress and I appreciate there are errors within, but I was hoping to appeal to those out there to help if possible or suggest a different approach.

All advice would be greatly appreciated.

function(unit, sensors, date_from, date_to){
  
  #  ------  QUERYING TASK  ------------------------
  
  sensor_data       <- 'pulls sensor info from unit names'
  
  #  ------  SUB FUNCTION - Unlist function  -------
  
  unlist_func <-  function(df = sensor_data, j){
  
                  sensor_data_distinct <- df %>% 
                                          mutate(unit_no = str_extract(unit, '\\d$')) %>%
                                          select(unit_no) %>% 
                                          distinct()

                  for (i in 1:nrow(df))
                  {
                    if(length(df[, j[[i]] ]) => 1 & str_ends(df$unit, sensor_data_distinct$unit_no)){
                        unlist(df[i, j])
                        df[sensor_data_distinct$unit_no]
                    } else(sensor_data[, j])
                  }

  #  -------  STAGE 1 CLEANSING FUNCTION  -------------
  # unit will always be in column 1 and datetime always in column 2
  stg1_cleanse    <- for(j in 3:ncol(sensor_data)) {       # for-loop over columns
                        if (is.list(sensor_data[, j]) == TRUE){
                         lapply(unlist_func.....)
                        }
                      }
                  }
  
  #  -------  STAGE 1 CLEANSING FUNCTION  -------------
                        
  stg2_cleanse    <- 'further cleansing routine'
  
}

CodePudding user response：

I think you can greatly simplify your function. This appears to produce the result you want. The custom function extract_func implements the rule in which each row is checked for length. If it is a single-element value, the single element is returned; if it is a vector, the element at the requested index is returned instead. We can then use dplyr's rowwise and across functions to apply this function to columns "sen1" and "sen2", passing "unit_num" as the index argument (when needed).

library(tidyverse)

unit <- c('a2b7/01', 'a2b7/02', 'a2b7/03', 'a2b7/01', 'a2b7/02', 'a2b7/03', 'a2b7/01', 'a2b7/02', 'a2b7/03')
sen1 <- list(c(1,2,3), c(4,5,6), c(7,8,9), c(10,11,12), c(13,14,15), c(16,17,18), c(19,20,21), c(22,23,24), c(25,26,27))
sen2 <- list(1, 2, 3, 4, 5, 6, 7, 8, 9)

sensor_data <- tibble(unit, sen1, sen2)

extract_func <- function(data, idx) {

  # check for NULL data and convert to NA if it is present
  if (is.null(data)) {
      data <- NA
  }
  
  if (length(data) == 1) {
    return(data[1])
  } else {
    return(data[idx])
  }
  
}

sensor_data_clean <- sensor_data %>% 
  rowwise() %>% 
  mutate(
    unit_num = as.numeric(str_extract(unit, '\\d$')),
    across(c(sen1, sen2), ~extract_func(., unit_num), .names = 'extract_{.col}')
  )

# A tibble: 9 × 6
# Rowwise: 
  unit    sen1      sen2      unit_num extract_sen1 extract_sen2
  <chr>   <list>    <list>       <dbl>        <dbl>        <dbl>
1 a2b7/01 <dbl [3]> <dbl [1]>        1            1            1
2 a2b7/02 <dbl [3]> <dbl [1]>        2            5            2
3 a2b7/03 <dbl [3]> <dbl [1]>        3            9            3
4 a2b7/01 <dbl [3]> <dbl [1]>        1           10            4
5 a2b7/02 <dbl [3]> <dbl [1]>        2           14            5
6 a2b7/03 <dbl [3]> <dbl [1]>        3           18            6
7 a2b7/01 <dbl [3]> <dbl [1]>        1           19            7
8 a2b7/02 <dbl [3]> <dbl [1]>        2           23            8
9 a2b7/03 <dbl [3]> <dbl [1]>        3           27            9

Since you're just checking the length of the vector and then extracting a single element, you could do the following inline within mutate (though a predefined custom function like the above gives you a little more flexibility for the future).

sensor_data_clean <- sensor_data %>% 
  rowwise() %>% 
  mutate(
    unit_num = as.numeric(str_extract(unit, '\\d$')),
    across(c(sen1, sen2), ~(.[min(length(.), unit_num)]), .names = 'extract_{.col}')
  )