Let's consider the following R code only for exemplification purpose:
X <- tibble(v0=seq( as.Date("2011-07-01"), by=1, len=6),v1 = c(1,-1,2,1,2,-1), v2 = replicate(6, 0), v3 = c(NA,NA,NA,NA,NA,NA), v4 = c(NA,NA,NA,NA,NA,NA), v5 = replicate(6, 0))
Y <- tibble(v0=seq( as.Date("2011-07-01"), by=1, len=6),v1 = c(1,1,2,1,2,1), v2 = c(1,NA,2,1,NA,1), v3 = c(NA,NA,3,NA,5,NA), v4 = c(NA,NA,NA,NA,NA,NA), v5 = replicate(6, 0))
dataset <- list(X,Y)
names(dataset) <- c("X","Y")
dataset
$X
# A tibble: 6 × 6
v0 v1 v2 v3 v4 v5
<date> <dbl> <dbl> <lgl> <lgl> <dbl>
1 2011-07-01 1 0 NA NA 0
2 2011-07-02 -1 0 NA NA 0
3 2011-07-03 2 0 NA NA 0
4 2011-07-04 1 0 NA NA 0
5 2011-07-05 2 0 NA NA 0
6 2011-07-06 -1 0 NA NA 0
$Y
# A tibble: 6 × 6
v0 v1 v2 v3 v4 v5
<date> <dbl> <dbl> <dbl> <lgl> <dbl>
1 2011-07-01 1 1 NA NA 0
2 2011-07-02 1 NA NA NA 0
3 2011-07-03 2 2 3 NA 0
4 2011-07-04 1 1 NA NA 0
5 2011-07-05 2 NA 5 NA 0
6 2011-07-06 1 1 NA NA 0
I would like to:
- Set all the cols except the first for X and Y as numeric;
- Set all negative values to zeros (both for X and Y, even if Y does not have any);
This should result in element(2,2) and element(6,2) equals zero for the first tibble.
- Exclude from all the dataset cols with all
NA
or all zero values in three different cases:
a. dataset_a
variables that does not have any complete missing value or zeros column (in the example dataset_a
equals the dates and v1
);
b. dataset_b
variables that does not have all missing value or zeros column across the dataset (in the example dataset_a
equals the dates and v1,v2,v3
);
CodePudding user response:
library(tidyverse)
new_dataset <- lapply(dataset, function(d) {
d %>%
mutate(
across(-1, as.numeric), # condition 1
across(where(is.numeric), ~pmax(., 0)) # condition 2
) %>%
select(where(~!all(is.na(.x) | .x == 0))) # condition 3
})
$X
# A tibble: 6 × 2
v0 v1
<date> <dbl>
1 2011-07-01 1
2 2011-07-02 0
3 2011-07-03 2
4 2011-07-04 1
5 2011-07-05 2
6 2011-07-06 0
$Y
# A tibble: 6 × 4
v0 v1 v2 v3
<date> <dbl> <dbl> <dbl>
1 2011-07-01 1 1 NA
2 2011-07-02 1 NA NA
3 2011-07-03 2 2 3
4 2011-07-04 1 1 NA
5 2011-07-05 2 NA 5
6 2011-07-06 1 1 NA
CodePudding user response:
Here is another dplyr
approach, which also employs lapply
to go through the list elements.
Two data frames (dataset_a
and dataset_b
) are created in the global environment.
library(dplyr)
setNames(lapply(dataset, function(x)
mutate(x, across(-1, ~ifelse(.x < 0, 0L, as.integer(.x)))) %>%
select(where(~ any(. != 0, na.rm = T)))), c("dataset_a", "dataset_b")) %>%
list2env(envir = .GlobalEnv)
dataset_a
# A tibble: 6 × 2
v0 v1
<date> <int>
1 2011-07-01 1
2 2011-07-02 0
3 2011-07-03 2
4 2011-07-04 1
5 2011-07-05 2
6 2011-07-06 0
dataset_b
# A tibble: 6 × 4
v0 v1 v2 v3
<date> <int> <int> <int>
1 2011-07-01 1 1 NA
2 2011-07-02 1 NA NA
3 2011-07-03 2 2 3
4 2011-07-04 1 1 NA
5 2011-07-05 2 NA 5
6 2011-07-06 1 1 NA