Home > Software engineering >  How to use the tidyr complete and fill functions to extend a column of NA's in R dataframe?
How to use the tidyr complete and fill functions to extend a column of NA's in R dataframe?

Time:11-23

I'm using the tidyr complete() and fill() functions to extend (copy down) a dataframe so all ID elements have the same number of rows. The code posted at the bottom correctly extends all fields, with the exception of the "Bal2" column of the dataframe where a series of NA's should be extended. Any recommendations for how to correct this?

The NA values do serve a calculation purpose in the fuller code this is deployed in. Also please note that I have another code snippet for correctly extending the "Period_2" column so I don't need help with "Period_2". It's been omitted for code brevity.

The below illustrates the issue when generating the testDF and testDF1 dataframes:

enter image description here

Code:

library(dplyr)
library(tidyr)

testDF <-
  data.frame(
    ID = c(rep(1,5),rep(50,3),rep(60,3)),
    Period_1 = c(1:5,1:3,1:3),
    Period_2 = c("2012-06","2012-07","2012-08","2012-09","2012-10","2013-06","2013-07","2013-08","2012-10","2012-11","2012-12"),
    Bal1 = c(rep(10,5),21:23,36:34),
    Bal2 = c(rep(12,8),rep(NA,3))
  )

testDF1 <- testDF %>%
  tidyr::complete(ID, nesting(Period_1)) %>%
  tidyr::fill(Bal1, Bal2, .direction = "down")
testDF1 <- as.data.frame(testDF1)

CodePudding user response:

Use dplry::group_by() before you fill():

require(dplyr)
#> Loading required package: dplyr
require(tidyr)
#> Loading required package: tidyr

test <- tribble(
  ~id,    ~value,
  "A",    80,
  "A",    NA,
  "A",    NA,
  "B",    NA,
  "B",    NA
)

fill(test, value)
#> # A tibble: 5 × 2
#>   id    value
#>   <chr> <dbl>
#> 1 A        80
#> 2 A        80
#> 3 A        80
#> 4 B        80
#> 5 B        80

test <- group_by(test, id)
fill(test, value)
#> # A tibble: 5 × 2
#> # Groups:   id [2]
#>   id    value
#>   <chr> <dbl>
#> 1 A        80
#> 2 A        80
#> 3 A        80
#> 4 B        NA
#> 5 B        NA

Created on 2022-11-23 with reprex v2.0.2

CodePudding user response:

As mentioned in the comments, group by the ID and this should be resolved:

library(dplyr)
# library(tidyr)
testDF %>%
  tidyr::complete(ID, tidyr::nesting(Period_1)) %>%
  group_by(ID) %>%
  tidyr::fill(Bal1, Bal2, .direction = "down") %>%
  ungroup()
# # A tibble: 15 x 5
#       ID Period_1 Period_2  Bal1  Bal2
#    <dbl>    <int> <chr>    <dbl> <dbl>
#  1     1        1 2012-06     10    12
#  2     1        2 2012-07     10    12
#  3     1        3 2012-08     10    12
#  4     1        4 2012-09     10    12
#  5     1        5 2012-10     10    12
#  6    50        1 2013-06     21    12
#  7    50        2 2013-07     22    12
#  8    50        3 2013-08     23    12
#  9    50        4 NA          23    12
# 10    50        5 NA          23    12
# 11    60        1 2012-10     36    NA
# 12    60        2 2012-11     35    NA
# 13    60        3 2012-12     34    NA
# 14    60        4 NA          34    NA
# 15    60        5 NA          34    NA
  • Related