I receive dataframe but the number of columns (V) is increasing regularly (actually V49). This example takes only V1 to V7. I have a lot of Nas and 3 others columns ID, REP and all who are used next step.
ID <- c("A", "B", "B3", "E4", "JE5", "L6")
V1 <- c(3, 5, 1, 3, 7, 1)
V2 <- c(6, 4, 2, 7, 6, 2)
V3 <- c(6, 5, 2, 7, 6, 3)
V4 <- c(6, 7, 1, 7, 6, 3)
V5 <- c(NA, NA, 2, 7, NA, 3)
V6 <- c(NA, NA, 2, 7, NA, 3)
V7 <- c(NA, NA, NA, 7, NA, 3)
REP <- c(4, 4, 6, 7, 4, 7)
all <- c(6, 5, 2, 7, 6, 3)
variation <- c(0, 0, 0, 0, 0, 0)
df <- data.frame(ID, V1, V2, V3, V4, V5, V6, V7, REP, all, variation)
I want this result : add variation == 2
when Vi 1 - Vi < 0 or > 1
.
i = V1
to Vmax
: Vmax
is different for each individual V4
at A
and B
V6
at B3
....
# ID V1 V2 V3 V4 V5 V6 V7 REP all variation
# 1 A 3 6 6 6 NA NA NA 4 6 0
# 2 B 5 4 5 7 NA NA NA 4 5 0
# 3 B3 1 2 2 1 2 2 NA 6 2 0
# 4 E4 3 7 7 7 7 7 7 7 7 0
# 5 JE5 7 6 6 6 NA NA NA 4 6 0
# 6 L6 1 2 3 3 3 3 3 7 3 0
I try with loop but it takes long time so I try with case_when()
. It's work but every week I need to add new line because I want to see when there is a variation between the variable Vi
and Vi 1
.
!between(Vi 1 - Vi, 0, 1) ~ 2,....
df <- df %>%
mutate(variation = case_when(
!between(V2 - V1, 0, 1) ~ 2,
!between(V3 - V2, 0, 1) ~ 2,
!between(V4 - V3, 0, 1) ~ 2,
!between(V5 - V4, 0, 1) ~ 2,
!between(V6 - V5, 0, 1) ~ 2,
!between(V7 - V6, 0, 1) ~ 2,
TRUE ~ as.numeric(variation)))
Is there an automatic increment for case_when
or other function with apply
or map
to avoid writing all variables by hand
Thank you in advance.
CodePudding user response:
I would pivot the data into a longer format, then just compare all the values to their lag in a single statement.
df %>%
pivot_longer(matches("V[0-9] ")) %>%
group_by(ID) %>%
summarize(variation = if (any(!between(value - lag(value), 0, 1), na.rm = T)) 2 else unique(variation)) %>%
right_join(select(df, -variation), by = "ID")
#> # A tibble: 6 × 11
#> ID variation V1 V2 V3 V4 V5 V6 V7 REP all
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 A 2 3 6 6 6 NA NA NA 4 6
#> 2 B 2 5 4 5 7 NA NA NA 4 5
#> 3 B3 2 1 2 2 1 2 2 NA 6 2
#> 4 E4 2 3 7 7 7 7 7 7 7 7
#> 5 JE5 2 7 6 6 6 NA NA NA 4 6
#> 6 L6 0 1 2 3 3 3 3 3 7 3