Home > Software engineering >  increment automatic to case_when or substitute of loops
increment automatic to case_when or substitute of loops

Time:12-17

I receive dataframe but the number of columns (V) is increasing regularly (actually V49). This example takes only V1 to V7. I have a lot of Nas and 3 others columns ID, REP and all who are used next step.

ID <- c("A", "B", "B3", "E4", "JE5", "L6")
V1 <- c(3, 5, 1, 3, 7, 1)
V2 <- c(6, 4, 2, 7, 6, 2)
V3 <- c(6, 5, 2, 7, 6, 3)
V4 <- c(6, 7, 1, 7, 6, 3)
V5 <- c(NA, NA, 2, 7, NA, 3)
V6 <- c(NA, NA, 2, 7, NA, 3)
V7 <- c(NA, NA, NA, 7, NA, 3)
REP <- c(4, 4, 6, 7, 4, 7)
all <- c(6, 5, 2, 7, 6, 3)
variation <- c(0, 0, 0, 0, 0, 0)
df <- data.frame(ID, V1, V2, V3, V4, V5, V6, V7, REP, all, variation)

I want this result : add variation == 2 when Vi 1 - Vi < 0 or > 1.
i = V1 to Vmax: Vmax is different for each individual V4 at A and B V6 at B3....

#    ID V1 V2 V3 V4 V5 V6 V7 REP all variation
# 1   A  3  6  6  6 NA NA NA   4   6         0
# 2   B  5  4  5  7 NA NA NA   4   5         0
# 3  B3  1  2  2  1  2  2 NA   6   2         0
# 4  E4  3  7  7  7  7  7  7   7   7         0
# 5 JE5  7  6  6  6 NA NA NA   4   6         0
# 6  L6  1  2  3  3  3  3  3   7   3         0

I try with loop but it takes long time so I try with case_when(). It's work but every week I need to add new line because I want to see when there is a variation between the variable Vi and Vi 1.

!between(Vi 1 - Vi, 0, 1) ~ 2,....

df <- df %>%
  mutate(variation = case_when(
                     !between(V2 - V1, 0, 1) ~ 2,
                     !between(V3 - V2, 0, 1) ~ 2,
                     !between(V4 - V3, 0, 1) ~ 2,
                     !between(V5 - V4, 0, 1) ~ 2,
                     !between(V6 - V5, 0, 1) ~ 2,
                     !between(V7 - V6, 0, 1) ~ 2,
                     TRUE ~ as.numeric(variation)))

Is there an automatic increment for case_when or other function with apply or map to avoid writing all variables by hand

Thank you in advance.

CodePudding user response:

I would pivot the data into a longer format, then just compare all the values to their lag in a single statement.


df %>%
  pivot_longer(matches("V[0-9] ")) %>%
  group_by(ID) %>%
  summarize(variation = if (any(!between(value - lag(value), 0, 1), na.rm = T)) 2 else unique(variation)) %>%
  right_join(select(df, -variation), by = "ID")
#> # A tibble: 6 × 11
#>   ID    variation    V1    V2    V3    V4    V5    V6    V7   REP   all
#>   <chr>     <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 A             2     3     6     6     6    NA    NA    NA     4     6
#> 2 B             2     5     4     5     7    NA    NA    NA     4     5
#> 3 B3            2     1     2     2     1     2     2    NA     6     2
#> 4 E4            2     3     7     7     7     7     7     7     7     7
#> 5 JE5           2     7     6     6     6    NA    NA    NA     4     6
#> 6 L6            0     1     2     3     3     3     3     3     7     3
  • Related