In my example dataframe:
df <- data.frame(main=c(2,6,10),
V1=c(1,3,5),
V2=c(3,5,7),
V3=c(5,7,9))
I would like to create column 5 that would check between which column values the value in main falls in and assign the name of the last column that has a value lower than main to it, so in this example the results would be Row 1 = 2 (main) is higher than 1 (V1), but lower than 3(V2), so column 5 would be "V1" Row 2 = 6 (main) is higher than 3 (V1) and 5 (V2) but lower than 7(V3), so column 5 would be "V2" Row 3 = 10 (main) is higher than 5 (V1), 7 (V2) and 7 (V3), so column 5 would be "V3"
I would also prefer NOT to include the column names in the code, as there will be a lot of them in the final dataframe and will be subject to change.
Thanks!
CodePudding user response:
Here is a solution using pivot_longer
:
library(dplyr)
library(tidyr)
df %>%
pivot_longer(-main) %>%
group_by(main) %>%
mutate(Col5 = case_when(main > value & main < lead(value) ~ name,
main > max(value) ~ last(name),
TRUE ~ NA_character_)) %>%
fill(Col5, .direction = "updown") %>%
pivot_wider(names_from = name, values_from = value)
main Col5 V1 V2 V3
<dbl> <chr> <dbl> <dbl> <dbl>
1 2 V1 1 3 5
2 6 V2 3 5 7
3 10 V3 5 7 9
CodePudding user response:
Looping trought every row with purrr::pmap_chr
, applying findInterval
on the value on main
against every other column, then getting the colname of the associated interval:
df$interval = purrr::pmap_chr(df, function(main, ...){
colnames(df)[findInterval(main, c(...)) 1]}) # 1 because V1 is the second column
Result:
main V1 V2 V3 interval
1 2 1 3 5 V1
2 6 3 5 7 V2
3 10 5 7 9 V3
Further explanation:
purrr::pmap
loops trough every row in a dataframe, passing all elements of the row as arguments to a function. The first argument of my function is main
, which will be filled with the first element of the row (the one in the main
column). The second argument of my function is ...
, the "dynamic dots" (they are actual code). They are used to pass along every other argument that the function receives (in this case, every other element from the row to form the vec
argument of findInterval
).
This method need the main
column to be the first, and that there aren't any other columns exept the ones used to make the intervale on the data frame. Let me know if that isn't the case.