I have the following simplified dataframe.
test <- data.frame(
ice = c(1, 0.8, 0.5, 0.4),
eonia = c(0.5, 0, 0, -0.4),
euribor = c(1, -0.8, 1, -0.2),
cp = c(-0.7, -0.6, -0.4, -0.5)
)
row.names(test) <- colnames(test)
I would like to apply a condition for every column, which keeps only those values satisfying the condition:
test[(test$ice>= 0.8 & test$ice< 1) | (test$ice<= -0.8 & test$ice> -1), , drop=FALSE]
However, my real dataframe contains of many variables and I dont want to apply this code "manually" to every column. Note that I might need to add each column to a list or a new dataframe after filtering for this condition.
Is there an efficient way to loop over every column and maybe safe every column as a new dataframe or add it to a list.
The first dataframe (or part of the list) should look like this:
ice
ice 1
eonia 0.8
Many thanks in advance
CodePudding user response:
We can define a custom function and loop through the columns. Here, I am using dplyr::between
which is equivalent of x >= left & x <= right
but it can easily be modified to the code that you need.
custom_filter <- function(df, colName, right, left, right_n = -right, left_n = -left){
require(dplyr)
require(rlang)
df %>%
filter(between(!! sym(colName), right, left) | between(!! sym(colName), right_n, left_n))
}
lapply(names(test) , function(colN) custom_filter(test, colN, 0.8, 1))
CodePudding user response:
The conditions within []
are already applied to every column. To keep the matrix layout in case of dropped values you can explicitly add e.g. NA
.
Here's an example (EDIT with help from @thelatemail):
test[ !(( test >= 0.8 & test < 1 )|( test <= -0.8 & test > -1)) ] <- NA
ice eonia euribor cp
ice NA NA NA NA
eonia 0.8 NA -0.8 NA
euribor NA NA NA NA
cp NA NA NA NA
Keep in mind that this is a so called in-place modification, altering your dataset (here the data frame test
) directly.
CodePudding user response:
Make a f
unction with your selection logic, then loop over each column in your data and subset it using the function:
f <- function(x) (x >= 0.8 & x < 1) | (x <= -0.8 & x > -1)
lapply(names(test), \(n) test[ f(test[[n]]), n, drop=FALSE] )
If the function needs to accept arguments for the low and high points either side of 0, this can be edited in too:
f <- function(x, low, high) abs(x) >= low & abs(x) < high
lapply(names(test), \(n) test[ f(test[[n]], 0.8, 1.0), n, drop=FALSE] )
#[[1]]
# ice
#eonia 0.8
#
#[[2]]
#[1] eonia
#<0 rows> (or 0-length row.names)
#
#[[3]]
# euribor
#eonia -0.8
#
#[[4]]
#[1] cp
#<0 rows> (or 0-length row.names)