Home > Back-end >  How to assign value to dataframe column based on the name of the column and values between columns i
How to assign value to dataframe column based on the name of the column and values between columns i

Time:10-25

In my example dataframe:

df <- data.frame(main=c(2,6,10), 
                 V1=c(1,3,5), 
                 V2=c(3,5,7),
                 V3=c(5,7,9)) 

I would like to create column 5 that would check between which column values the value in main falls in and assign the name of the last column that has a value lower than main to it, so in this example the results would be Row 1 = 2 (main) is higher than 1 (V1), but lower than 3(V2), so column 5 would be "V1" Row 2 = 6 (main) is higher than 3 (V1) and 5 (V2) but lower than 7(V3), so column 5 would be "V2" Row 3 = 10 (main) is higher than 5 (V1), 7 (V2) and 7 (V3), so column 5 would be "V3"

I would also prefer NOT to include the column names in the code, as there will be a lot of them in the final dataframe and will be subject to change.

Thanks!

CodePudding user response:

Here is a solution using pivot_longer:

library(dplyr)
library(tidyr)
df %>% 
  pivot_longer(-main) %>% 
  group_by(main) %>% 
  mutate(Col5 = case_when(main > value & main < lead(value) ~ name,
                       main > max(value) ~ last(name),
                       TRUE ~ NA_character_)) %>% 
  fill(Col5, .direction = "updown") %>% 
  pivot_wider(names_from = name, values_from = value)
   main Col5     V1    V2    V3
  <dbl> <chr> <dbl> <dbl> <dbl>
1     2 V1        1     3     5
2     6 V2        3     5     7
3    10 V3        5     7     9

CodePudding user response:

Looping trought every row with purrr::pmap_chr, applying findInterval on the value on main against every other column, then getting the colname of the associated interval:

df$interval = purrr::pmap_chr(df, function(main, ...){
  colnames(df)[findInterval(main, c(...))   1]}) # 1 because V1 is the second column 

Result:

  main V1 V2 V3 interval
1    2  1  3  5       V1
2    6  3  5  7       V2
3   10  5  7  9       V3

Further explanation:

purrr::pmap loops trough every row in a dataframe, passing all elements of the row as arguments to a function. The first argument of my function is main, which will be filled with the first element of the row (the one in the main column). The second argument of my function is ..., the "dynamic dots" (they are actual code). They are used to pass along every other argument that the function receives (in this case, every other element from the row to form the vec argument of findInterval).

This method need the main column to be the first, and that there aren't any other columns exept the ones used to make the intervale on the data frame. Let me know if that isn't the case.

  •  Tags:  
  • r
  • Related