Home > Back-end >  R: Filling Spaces in Missing Rows
R: Filling Spaces in Missing Rows

Time:01-21

I am working with the R programming language.

I have the following dataset:

name = c("john", "john", "john", "sarah", "sarah", "peter", "peter", "peter", "peter")
year = c(2010, 2011, 2014, 2010, 2015, 2011, 2012, 2013, 2015)
age = c(21, 22, 25, 55, 60, 61, 62, 63, 65)
gender = c("male", "male", "male", "female", "female", "male", "male", "male", "male" )
country_of_birth = c("australia", "australia", "australia", "uk", "uk", "mexico", "mexico", "mexico", "mexico")

my_data = data.frame(name, year, age, gender, country_of_birth)

  name year age gender country_of_birth
1  john 2010  21   male        australia
2  john 2011  22   male        australia
3  john 2014  25   male        australia
4 sarah 2010  55 female               uk
5 sarah 2015  60 female               uk
6 peter 2011  61   male           mexico
7 peter 2012  62   male           mexico
8 peter 2013  63   male           mexico
9 peter 2015  65   male           mexico

As we can see here, some of the people are missing "years" in this dataset. Suppose the first row corresponding to a person is the earliest year and the last row is the maximum year.

For each person in this dataset - I would like to "fill" between the missing rows. For example - in each missing row:

  • I would like the "age" variable to increase by 1 (e.g. in 2012, john should be 23 - in 2012, john should be 24)
  • I would the "gender" variable to stay the same
  • I would like the "country_of_birth" variable to stay the same

Here is the R code I am using:

library(tidyr)
library(dplyr)
my_data %>% 
    group_by(name) %>% 
    complete(year = full_seq(year, period = 1)) %>% 
    fill(year, age, gender, country_of_birth, .direction = "downup") %>%
    mutate(real_age= age - (row_number() - 1)) %>%
    ungroup

This code runs and seems to have added the missing rows - but the age variable is not being correctly added:

# A tibble: 16 x 6
   name   year   age gender country_of_birth real_age
   <chr> <dbl> <dbl> <chr>  <chr>               <dbl>
 1 john   2010    21 male   australia              21
 2 john   2011    22 male   australia              21
 3 john   2012    22 male   australia              20
 4 john   2013    22 male   australia              19
 5 john   2014    25 male   australia              21
 6 peter  2011    61 male   mexico                 61
 7 peter  2012    62 male   mexico                 61
 8 peter  2013    63 male   mexico                 61
 9 peter  2014    63 male   mexico                 60
10 peter  2015    65 male   mexico                 61
11 sarah  2010    55 female uk                     55
12 sarah  2011    55 female uk                     54
13 sarah  2012    55 female uk                     53
14 sarah  2013    55 female uk                     52
15 sarah  2014    55 female uk                     51
16 sarah  2015    60 female uk                     55

Currently, I am trying to resolve this problem by trying different combinations of mutate(real_age= age - (row_number() - 1)) - but so far, nothing seems to be working.

Can someone please show me how to fix this?

Thanks!

CodePudding user response:

One way is:

library(dplyr)
library(tidyr)

my_data %>% 
  group_by(name) %>% 
  complete(year = first(year): last(year)) %>% 
  mutate(age = ifelse(is.na(age), first(age) row_number()-1,age)) %>% 
  fill(c(gender, country_of_birth), .direction = "down")
  name   year   age gender country_of_birth
   <chr> <dbl> <dbl> <chr>  <chr>           
 1 john   2010    21 male   australia       
 2 john   2011    22 male   australia       
 3 john   2012    23 male   australia       
 4 john   2013    24 male   australia       
 5 john   2014    25 male   australia       
 6 peter  2011    61 male   mexico          
 7 peter  2012    62 male   mexico          
 8 peter  2013    63 male   mexico          
 9 peter  2014    64 male   mexico          
10 peter  2015    65 male   mexico          
11 sarah  2010    55 female uk              
12 sarah  2011    56 female uk              
13 sarah  2012    57 female uk              
14 sarah  2013    58 female uk              
15 sarah  2014    59 female uk              
16 sarah  2015    60 female uk   
  • Related