Home > database >  Dividing a dataframe row wise using a vector with condition in r
Dividing a dataframe row wise using a vector with condition in r

Time:07-01

I have a data frame and a vector with which I want to divide each row of the data frame.

col1 <- c(500, 20000, 50000)
col2 <- c(20000, 500, 50000)
col3 <- c(20000, 50000, 500)
dividing_factor <- c(1.5, 2, 0.5)
df <- data.frame(col1,col2, col3)

Result I am hoping for would look like df_div below. Essentially, only values that are above 500 should be divided with dividing_factor.

col1_div <- c(500, 20000/1.5, 50000/1.5)
col2_div <- c(20000/2, 500, 50000/2)
col3_div <- c(20000/0.5, 50000/0.5, 500)
df_div <- data.frame(col1_div,col2_div,col3_div)

I've been using apply function as below (please note that in the context of my data, I'm only selecting columns that contain "Col" in the title and I combine the output to the original data frame.), but I can't figure out how to add conditions (divide only above 500) to that. I've tried using mutate with ifelse instead but the fact that I'm dividing with a vector is throwing a wrench in my approach.

df_div <- df %>%
  select(contains("Col") %>%
  apply(., 1, function(x) {
      x / dividing_factor 
  }) %>%
  do.call(rbind, .) %>%
  as_tibble()

I'd be sincerely grateful for any advice.

CodePudding user response:

No need to use apply, we can use vectorized/matrix operations:

df / t(ifelse(df > 500, dividing_factor, 1))
#       col1  col2  col3
# 1   500.00 10000 4e 04
# 2 13333.33   500 1e 05
# 3 33333.33 25000 5e 02

CodePudding user response:

Or we could name the vector and use mutate/across:

dividing_factor <- c("col1" = 1.5, "col2" = 2, "col3" = 0.5)

df |>
  mutate(across(starts_with("col"),
                ~ ifelse(. > 500, ./dividing_factor[cur_column()], .)))

CodePudding user response:

You can get the positions of all values > 500, and use it to extract and replace those values with the result of your calculation.

idx <- which(df != 500, arr.ind = TRUE)
df[idx] <- (df / as.list(dividing_factor))[idx]

The 'trick' here is to divide df by a list.

Result

df
#      col1  col2  col3
#1   500.00 10000 4e 04
#2 13333.33   500 1e 05
#3 33333.33 25000 5e 02
  •  Tags:  
  • r
  • Related