Home > other >  Make a logical comparison of all elements of a vector in a function
Make a logical comparison of all elements of a vector in a function

Time:10-21

I want to create a function that allows me to input a data frame with a varying number of columns, and to create two new columns:

  1. one based on a logical comparison of all others and
  2. one based on a logical comparison of all others and the first new column.

A minimal example would be a data set with two variables:

V1 <- c(1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0)
V2 <- c(0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0)
Data <- data.frame(V1, V2)

I want to create the two new columns with a function looking like this:

my.spec.df <- function(data, variables, new.var.name){
   new.df <- data

   # First new column
   new.df[[new.var.name]] <- 0
   new.df[[new.var.name]][new.df$V1 == Lag(new.df$V1, 1) & new.df$V2 == Lag(new.df$V2, 1)] <- 1 # I want my logical comparison to be applicable to all variables listed in [[variables]], not just V1 and V2 used here as minimal example

   # Second new column
   new.df$Conj.Var.[[new.var.name]] <- 0 # I want this second new column to take the name "Conj.Var." the name of the first new variable, which I tried to achieve with the [[]] but it did not work (same in the next row)
   new.df$Conj.Var.[[new.var.name]][new.df$V1 == 1 & new.df$V2 == 1 & new.df[[new.var.name]] == 1] <- 1 # Again, I want the logical comparison to be applicable to all variables listed [[variables]] and the first newly created column

   return(new.df)
}

spec.df <- my.spec.df(Data,
                      variables=c("V1", "V2"),
                      new.var.name="NV1")

The new data frame should look like:

print(spec.df)
   V1 V2 NV1 Conj.Var.NV1
1   1  0   0            0
2   0  1   0            0
3   1  1   0            0
4   1  1   1            1
5   0  0   0            0
6   0  1   0            0
7   1  0   0            0
8   1  0   1            0
9   0  0   0            0
10  0  1   0            0
11  0  1   1            0
12  1  1   0            0
13  1  0   0            0
14  0  1   0            0
15  0  0   0            0

As commented in the code, I struggle with three things:

  1. apply the logical comparisons for the first new column to all variables listed (not just the two as in my minimal example) because the number could go from one variable listed to multiple ones,
  2. format the name of the second new column based on the name introduced for the first and
  3. apply the logical comparison for the second new column also to all variables listed.

Anyone that could help? Many thanks in advance!

CodePudding user response:

Here is a solution.
It uses an auxiliary function all_one_by_row to do the main work. And a temporary logical matrix to store the values equal to the lagged values in variables columns.

all_one_by_row <- function(data, cols) {
  if(missing(cols))
     (rowSums(data) == ncol(data))
  else
     (rowSums(data[cols]) == ncol(data))
}

my.spec.df <- function(data, variables, new.var.name){
  new.df <- data

  # First new column
  tmp <- sapply(new.df[variables], \(x) x == Lag(x, 1))
  tmp[is.na(tmp)] <- FALSE
  new.df[[new.var.name]] <- all_one_by_row(tmp)
  
  # Second new column
  New.Col <- paste0("Conj.Var.", new.var.name)
  Cols <- c(variables, new.var.name)
  new.df[[New.Col]] <- all_one_by_row(new.df, Cols)

  new.df
}

spec.df <- my.spec.df(Data,
                      variables=c("V1", "V2"),
                      new.var.name="NV1")

spec.df
#   V1 V2 NV1 Conj.Var.NV1
#1   1  0   0            0
#2   0  1   0            0
#3   1  1   0            0
#4   1  1   1            1
#5   0  0   0            0
#6   0  1   0            0
#7   1  0   0            0
#8   1  0   1            0
#9   0  0   0            0
#10  0  1   0            0
#11  0  1   1            0
#12  1  1   0            0
#13  1  0   0            0
#14  0  1   0            0
#15  0  0   0            0

CodePudding user response:

does this work?

library(tidyverse)
library(rlang)
my.spec.df <- function(data, variables, new.var.name){
  x <- sym(variables[1])
  y <- sym(variables[2])
  ind <- ncol(data) 3 # specify the yet to-be-created Conj.var index
  data %>%
    mutate(lag_x = lag(!!x, 1, default = 0),
           lag_y = lag(!!y, 1),
           "{new.var.name}" := ifelse(!!x == lag_x & !!y == lag_y, 1, 0)) %>%
    mutate("Conj.var.{new.var.name}" := ifelse(!!x == 1 & !!y == 1 & .[[ind]] == 1, 1, 0)) %>%
    select(-lag_x, -lag_y)
}

For versions of dplyr 1.0 and greater, we can use syntax from the glue package to name new variables through := See This post for other methods. Because we don't know the number of variables, we to refer to the new column dynamically. this Stack overflow post lists various methods to do that.

When tested on the sample data, my.spec.df(Data, variables = c("V1", "V2"), new.var.name = "NV1") returns

   V1 V2 NV1 Conj.var.NV1
1   1  0   0            0
2   0  1   0            0
3   1  1   0            0
4   1  1   1            1
5   0  0   0            0
6   0  1   0            0
7   1  0   0            0
8   1  0   1            0
9   0  0   0            0
10  0  1   0            0
11  0  1   1            0
12  1  1   0            0
13  1  0   0            0
14  0  1   0            0
15  0  0   0            0
  • Related