Home > Back-end >  Take the latest year value for nested object
Take the latest year value for nested object

Time:10-30

I have a nested object whereby the name of individual vehicles in the inner nest. This is not my dataset but I can reproduce the error with mtcars. Essentially, I am trying to grab the manufacturing_size for the latest year when its anything but Not Provided, and use only this value for manufacturing_size. However, for whatever reason the map/function does not enter all nests.

dataset:

mtcars <- mtcars %>% rownames_to_column()
emp <- c("Not Provided","Less than 250","250 to 499","500 to 999","1000 to 4999","5000 to 19,999")
mtcars$manufacturing_size <- c(rep(emp, 5) , "Not Provided", "Less than 250")
mtcars$year <- rep(2018:2021, 8)


mtcars1 <- mtcars
mtcars2 <- mtcars
mtcars3 <- mtcars

mtcars1$year <- rep(c(2019:2021, 2018), 8)
mtcars2$year <- rep(c(2020:2021, 2018, 2019), 8)
mtcars3$year <- rep(c(2021:2018), 8)

mtcarsAll <- rbind(mtcars, mtcars1, mtcars2, mtcars3)

Here is what I have tried:

mtcars %>% nest_by(gear) %>% ungroup %>% mutate(data = map(data, ~ .x %>% nest(data=rowname) %>%
                                                                                      mutate(data = map(data, function(x){
  someSize <- x[x$year == x[which.max(x$year),]$year,]$manufacturing_size
  if(someSize != 'Not Provided'){
    x$manufacturing_size = someSize
    return(x)
    
  }else {
    for(i in 1:nrow(x)){
      if(x$year[i] != 2018){
        someSize <- x[x$year == x[which.max(x$year)-i,]$year,]$manufacturing_size
        if(someSize != 'Not Provided'){
          x$manufacturing_size = someSize
          return(x)
        }
      } else{
        someSize <- x[x$year == x[which.max(x$year) i,]$year,]$manufacturing_size
        if(someSize != 'Not Provided'){
          x$manufacturing_size = someSize
          return(x)
        }
      }
    }
  }
}
))))

Which produces the following error:

Error in `mutate()`:
! Problem while computing `data = map(...)`.
Caused by error in `mutate()`:
! Problem while computing `data = map(...)`.
Caused by error in `vectbl_as_row_location()`:
! Must subset rows with a valid subscript vector.
ℹ Logical subscripts must match the size of the indexed input.
✖ Input has size 1 but subscript `x$year == x[which.max(x$year)]$year` has size 0.

This is because If I remove most of the function and print out someSize then It enters the first outer nest but not the others. What is an easier alternative?

Using the answer below, the following works:

mtr <- mtcarsAll %>% group_by(rowname) %>%
  mutate(
    man_size = case_when(
      manufacturing_size != "Not Provided" & max(year) == year~ manufacturing_size
    )
  ) 

mtr %>% ungroup %>% 
    fill(man_size, .direction = "updown")

CodePudding user response:

Does this do what you want. There is a lot of nesting in your example, which unless I am mistaken, isn't necessary.

I've altered your setup a little bit cause I don't think what you wanted was going to work:

  • used mtcars2 so as to not overwrite mtcars,
  • replace rep(emp, 5) with random draws from a standard normal distrubution rnorm(30)) becuase you didn't define emp
  • added a new grouping variable group so that each year only appears once for each group. (The way you had it with gear as the grouping var didn't work because there were multiple values for the most recent year)
mtcars2 <- mtcars %>% rownames_to_column("make")
mtcars2$manufacturing_size <- c(rnorm(30),"Not Provided", "Less than 250")
mtcars2$group <- rep(LETTERS[1:8], each = 4)
mtcars2$year <- rep(2018:2021, 8)

Then, rather than all the complex nesting you've done, you just do use an if_else statement or, I've prefered case_when to get the values you are intereseted in for the new variable man_size.

mtcars2 %>% 
  group_by(group) %>% 
  mutate(
    man_size = case_when(
      manufacturing_size != "Not Provided" & max(year) == year ~ manufacturing_size,
      TRUE ~ NA_character_
    )
  )

# A tibble: 32 × 16
# Groups:   group [8]
   make                mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb manufacturing_size group  year man_size        
   <chr>             <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>              <chr> <int> <chr>           
 1 Mazda RX4          21       6  160    110  3.9   2.62  16.5     0     1     4     4 -0.10777645987017  A      2018 NA              
 2 Mazda RX4 Wag      21       6  160    110  3.9   2.88  17.0     0     1     4     4 0.685034939673918  A      2019 NA              
 3 Datsun 710         22.8     4  108     93  3.85  2.32  18.6     1     1     4     1 0.0216291773402855 A      2020 NA              
 4 Hornet 4 Drive     21.4     6  258    110  3.08  3.22  19.4     1     0     3     1 0.227610843395319  A      2021 0.2276108433953…
 5 Hornet Sportabout  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2 0.342964251360947  B      2018 NA              
 6 Valiant            18.1     6  225    105  2.76  3.46  20.2     1     0     3     1 1.20792448510301   B      2019 NA              
 7 Duster 360         14.3     8  360    245  3.21  3.57  15.8     0     0     3     4 0.395983818669596  B      2020 NA              
 8 Merc 240D          24.4     4  147.    62  3.69  3.19  20       1     0     4     2 -0.42502805147035  B      2021 -0.425028051470…
 9 Merc 230           22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2 0.961054295375392  C      2018 NA              
10 Merc 280           19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4 -1.32030765978216  C      2019 NA              
# … with 22 more rows

If you then want to fill in those NAs with what you need you can just use tidyr::fill

Hope this helps.

EDIT after change from OP in comments.

OK, I see what you want now. Thanks for providing emp. I still made one more tiny change to your setup, to ensure there was a case where Not Provided would be the value of manufacuring_size for the maximum year in one of the groups (for group H).

mtcars2 <- mtcars %>% rownames_to_column()
emp <- c("Not Provided","Less than 250","250 to 499","500 to 999","1000 to 4999","5000 to 19,999")
mtcars2$manufacturing_size <- c(rep(emp, 5) ,  "Less than 250", "Not Provided")
mtcars2$group <- rep(LETTERS[1:8], each = 4)
mtcars2$year <- rep(2018:2021, 8)

We can then use the following:

mtcars3 <- mtcars2 %>% 
  group_by(group) %>% 
  mutate(
    man_size = case_when(
      max(year[manufacturing_size != "Not Provided"]) == year ~ manufacturing_size,
      TRUE ~ NA_character_
    )
  )

Then if you want to fill in all the values, you can do:

mtcars3 %>% 
  fill(man_size, .direction = "updown")
  •  Tags:  
  • r
  • Related