Home > Net >  Paste values in a column based on other observations in the dataframe in R
Paste values in a column based on other observations in the dataframe in R

Time:11-30

I have a very large (~30M observations) dataframe in R and I am having trouble with a new column I want to create.

The data is formatted like this:

  Country  Year   Value
1     A     2000     1
2     A     2001     NA
3     A     2002     2
4     B     2000     4
5     B     2001     NA
6     B     2002     NA
7     B     2003     3

My problem is that I would like to impute the NAs in the value column based on other values in that column. Specifically, if there is a non-NA value for the same country I would like that to replace the NA in later years, until there is another non-NA value.

The data above would therefore be transformed into this:

  Country  Year   Value
1     A     2000     1
2     A     2001     1
3     A     2002     2
4     B     2000     4
5     B     2001     4
6     B     2002     4
7     B     2003     3

To solve this, I first tried using a loop with a lookup function and also some if_else statements, but wasn't able to get it to behave as I expected. In general, I am struggling to find an efficient solution that will be able to perform the task in the order of minutes-hours and not days.

Is there an easy way to do this?

Thanks!

CodePudding user response:

Using tidyr's fill:

library(tidyverse)
df %>%
  group_by(Country) %>%
  fill(Value)

Result:

# A tibble: 7 × 3
# Groups:   Country [2]
  Country  Year Value
  <chr>   <dbl> <dbl>
1 A        2000     1
2 A        2001     1
3 A        2002     2
4 B        2000     4
5 B        2001     4
6 B        2002     4
7 B        2003     3
  • Related