Home > Software design >  How to handle list of tibbles in R and their content?
How to handle list of tibbles in R and their content?

Time:04-04

Let's consider the following R code only for exemplification purpose:

X <- tibble(v0=seq( as.Date("2011-07-01"), by=1, len=6),v1 = c(1,-1,2,1,2,-1), v2 = replicate(6, 0), v3 = c(NA,NA,NA,NA,NA,NA), v4 = c(NA,NA,NA,NA,NA,NA), v5 = replicate(6, 0))
Y <- tibble(v0=seq( as.Date("2011-07-01"), by=1, len=6),v1 = c(1,1,2,1,2,1), v2 = c(1,NA,2,1,NA,1), v3 = c(NA,NA,3,NA,5,NA), v4 = c(NA,NA,NA,NA,NA,NA), v5 = replicate(6, 0))
dataset <- list(X,Y)
names(dataset) <- c("X","Y")
dataset

$X
# A tibble: 6 × 6
  v0            v1    v2 v3    v4       v5
  <date>     <dbl> <dbl> <lgl> <lgl> <dbl>
1 2011-07-01     1     0 NA    NA        0
2 2011-07-02    -1     0 NA    NA        0
3 2011-07-03     2     0 NA    NA        0
4 2011-07-04     1     0 NA    NA        0
5 2011-07-05     2     0 NA    NA        0
6 2011-07-06    -1     0 NA    NA        0

$Y
# A tibble: 6 × 6
  v0            v1    v2    v3 v4       v5
  <date>     <dbl> <dbl> <dbl> <lgl> <dbl>
1 2011-07-01     1     1    NA NA        0
2 2011-07-02     1    NA    NA NA        0
3 2011-07-03     2     2     3 NA        0
4 2011-07-04     1     1    NA NA        0
5 2011-07-05     2    NA     5 NA        0
6 2011-07-06     1     1    NA NA        0

I would like to:

  1. Set all the cols except the first for X and Y as numeric;
  2. Set all negative values to zeros (both for X and Y, even if Y does not have any);

This should result in element(2,2) and element(6,2) equals zero for the first tibble.

  1. Exclude from all the dataset cols with all NA or all zero values in three different cases:

a. dataset_a variables that does not have any complete missing value or zeros column (in the example dataset_a equals the dates and v1);

b. dataset_b variables that does not have all missing value or zeros column across the dataset (in the example dataset_a equals the dates and v1,v2,v3);

CodePudding user response:

library(tidyverse)

new_dataset <- lapply(dataset, function(d) {
  
  d %>% 
    mutate(
      across(-1, as.numeric), # condition 1
      across(where(is.numeric), ~pmax(., 0)) # condition 2
    ) %>% 
    select(where(~!all(is.na(.x) | .x == 0))) # condition 3
  
})

$X
# A tibble: 6 × 2
  v0            v1
  <date>     <dbl>
1 2011-07-01     1
2 2011-07-02     0
3 2011-07-03     2
4 2011-07-04     1
5 2011-07-05     2
6 2011-07-06     0

$Y
# A tibble: 6 × 4
  v0            v1    v2    v3
  <date>     <dbl> <dbl> <dbl>
1 2011-07-01     1     1    NA
2 2011-07-02     1    NA    NA
3 2011-07-03     2     2     3
4 2011-07-04     1     1    NA
5 2011-07-05     2    NA     5
6 2011-07-06     1     1    NA

CodePudding user response:

Here is another dplyr approach, which also employs lapply to go through the list elements.

Two data frames (dataset_a and dataset_b) are created in the global environment.

library(dplyr)

setNames(lapply(dataset, function(x) 
  mutate(x, across(-1, ~ifelse(.x < 0, 0L, as.integer(.x)))) %>% 
    select(where(~ any(. != 0, na.rm = T)))), c("dataset_a", "dataset_b")) %>% 
  list2env(envir = .GlobalEnv)

dataset_a
# A tibble: 6 × 2
  v0            v1
  <date>     <int>
1 2011-07-01     1
2 2011-07-02     0
3 2011-07-03     2
4 2011-07-04     1
5 2011-07-05     2
6 2011-07-06     0

dataset_b
# A tibble: 6 × 4
  v0            v1    v2    v3
  <date>     <int> <int> <int>
1 2011-07-01     1     1    NA
2 2011-07-02     1    NA    NA
3 2011-07-03     2     2     3
4 2011-07-04     1     1    NA
5 2011-07-05     2    NA     5
6 2011-07-06     1     1    NA
  • Related