Home > Software design >  Conditionally remove value for each column but keep each column as a new dataframe using a loop
Conditionally remove value for each column but keep each column as a new dataframe using a loop

Time:11-11

I have the following simplified dataframe.

test <- data.frame(
  ice = c(1, 0.8, 0.5, 0.4),
  eonia = c(0.5, 0, 0, -0.4),
  euribor = c(1, -0.8, 1, -0.2),
  cp = c(-0.7, -0.6, -0.4, -0.5)
)

row.names(test) <- colnames(test)

I would like to apply a condition for every column, which keeps only those values satisfying the condition:

test[(test$ice>= 0.8 & test$ice< 1) | (test$ice<= -0.8 & test$ice> -1), , drop=FALSE]

However, my real dataframe contains of many variables and I dont want to apply this code "manually" to every column. Note that I might need to add each column to a list or a new dataframe after filtering for this condition.

Is there an efficient way to loop over every column and maybe safe every column as a new dataframe or add it to a list.

The first dataframe (or part of the list) should look like this:

      ice
ice   1
eonia 0.8

Many thanks in advance

CodePudding user response:

We can define a custom function and loop through the columns. Here, I am using dplyr::between which is equivalent of x >= left & x <= right but it can easily be modified to the code that you need.

custom_filter <- function(df, colName, right, left, right_n = -right, left_n = -left){
  require(dplyr)
  require(rlang)
  
df %>% 
  filter(between(!! sym(colName), right, left) | between(!! sym(colName), right_n, left_n))
}


lapply(names(test) , function(colN) custom_filter(test, colN, 0.8, 1))

CodePudding user response:

The conditions within [] are already applied to every column. To keep the matrix layout in case of dropped values you can explicitly add e.g. NA.

Here's an example (EDIT with help from @thelatemail):

test[ !(( test >= 0.8 & test < 1 )|( test <= -0.8 & test > -1)) ] <- NA

        ice eonia euribor cp
ice      NA    NA      NA NA
eonia   0.8    NA    -0.8 NA
euribor  NA    NA      NA NA
cp       NA    NA      NA NA

Keep in mind that this is a so called in-place modification, altering your dataset (here the data frame test) directly.

CodePudding user response:

Make a function with your selection logic, then loop over each column in your data and subset it using the function:

f <- function(x) (x >= 0.8 & x < 1) | (x <= -0.8 & x > -1)
lapply(names(test), \(n) test[ f(test[[n]]), n, drop=FALSE] )

If the function needs to accept arguments for the low and high points either side of 0, this can be edited in too:

f <- function(x, low, high) abs(x) >= low & abs(x) < high
lapply(names(test), \(n) test[ f(test[[n]], 0.8, 1.0), n, drop=FALSE] )

#[[1]]
#      ice
#eonia 0.8
#
#[[2]]
#[1] eonia
#<0 rows> (or 0-length row.names)
#
#[[3]]
#      euribor
#eonia    -0.8
#
#[[4]]
#[1] cp
#<0 rows> (or 0-length row.names)
  • Related