Home > other >  error in replacing NAs with sequence of numbers of varying length for a variable in dataframe
error in replacing NAs with sequence of numbers of varying length for a variable in dataframe

Time:10-21

I have a dataframe which contains many variables, but I would want to focus on only one, 'X', as shown below:

df <- data.frame("x" = c("NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", 1, "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", 1, "NA", "NA", "NA", "NA", "NA", "NA", "NA", 1, "NA", "NA", "NA", "NA", "NA", "NA", 1,"NA", "NA", "NA", "NA", "NA", 1, "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", 1))

What I want to do is to fill numbers in sequence starting from 2 in a way:

  1. when there is x==1, next NA should take 2, 3, 4 and so forth, until, again x==1 arrives and then next NA should start filling with 2,3,4, and so forth.

Example output:

x = 2,3,4,5,1,2,3,4,5,6,7,1,2,3,4,5,6,7,8,9,1,2,3,4,1,2,3,4,5,1 ......

What I am trying to do is:

df$Sequence <- ifelse(df1$x!="1" & is.na(df1$x), seq(2,100), df1$x)

But this is not returning me the expected output, why?

CodePudding user response:

Since x starts with 8 consecutive NA, it is still unclear on how to impute the very first rows. Why does your example output start with 2? Here is how to impute everything beginning from the first occurrence of 1. I modified the example by let it start from 1:

library(tidyverse)
df <- data.frame("x" = c(1, "NA", "NA", "NA", "NA", "NA", "NA", "NA", 1, "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", 1, "NA", "NA", "NA", "NA", "NA", "NA", "NA", 1, "NA", "NA", "NA", "NA", "NA", "NA", 1, "NA", "NA", "NA", "NA", "NA", 1, "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", 1))

df <-
  df %>%
  na_if("NA") %>%
  as_tibble() %>%
  mutate(id = row_number())

y <-
  df %>%
  filter(x == 1) %>%
  # calculate block sizes
  transmute(
    from = id,
    to = lead(id),
    diff = to - from
  ) %>%
  # vector ends with a 1 and there is no n 1 th element
  replace_na(list(diff = 1)) %>%
  pull(diff) %>%
  map(seq) %>%
  flatten() %>%
  as.numeric()
y
#>  [1] 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 1 2 3 4 5 6
#> [39] 1 2 3 4 5 6 7 8 9 1

df %>% transmute(x, y = y)
#> # A tibble: 48 x 2
#>    x         y
#>    <chr> <dbl>
#>  1 1         1
#>  2 <NA>      2
#>  3 <NA>      3
#>  4 <NA>      4
#>  5 <NA>      5
#>  6 <NA>      6
#>  7 <NA>      7
#>  8 <NA>      8
#>  9 1         1
#> 10 <NA>      2
#> # … with 38 more rows

Created on 2021-10-20 by the reprex package (v2.0.1)

CodePudding user response:

Same approach as @danlooo but using data.table instead of tidyverse

df <- data.table(x = as.integer(c(1, "NA", "NA", "NA", "NA", "NA", "NA", "NA", 1, "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", 1, "NA", "NA", "NA", "NA", "NA", "NA", "NA", 1, "NA", "NA", "NA", "NA", "NA", "NA", 1,"NA", "NA", "NA", "NA", "NA", 1, "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", 1)))

df[, id := .I][!is.na(x), diff := lead(id) - id][is.na(diff), diff := 1L][, y := unlist(mapply(seq, df[x == 1L]$diff))][, c("id", "diff") := NULL]
  • Related