Home > Software engineering >  Group-by operation to produce new column with factor indicating groups
Group-by operation to produce new column with factor indicating groups

Time:11-19

I am trying to produce a new column (Yrgroup) that puts individual years into 2year groups so:

Yrs TS Yrgroup 
2011 2  11/12
2011 2  11/12
2012 4  11/12
2012 8  11/12
2013 2  13/14
2013 1  13/14
2014 3  13/14
2014 7  13/14
Yr = c(2011,2011,2012,2012,2013,2013,2014,2014)
Yr
Tranship = c(2,5,8,2,2,2,7,8)
df = data.frame(Yr, Tranship)
df
df$Yrgroup = NA
#library(dplyr)
df %>% 
  group_by(Yr 1)

This is what I have tried so far but I cannot fill in the year group column

CodePudding user response:

You can do this as follows:

f <- function(y) if_else(y%%2==0, paste0(y-1,"/",y),paste0(y,"/",y 1))

mutate(df, Yrsgroup = f(Yrs%00))

Output:

    Yrs TS Yrsgroup
1: 2011  2    11/12
2: 2011  2    11/12
3: 2012  4    11/12
4: 2012  8    11/12
5: 2013  2    13/14
6: 2013  1    13/14
7: 2014  3    13/14
8: 2014  7    13/14

Note that my use of Yrs%00 is not as generalizable as this below alternative, which produces the same output, but works for wider set of years

mutate(df, Yrsgroup = f(as.numeric(substr(Yrs,3,4))))

Finally, this version of f() handles more cases (for example, it would correctly handle the year 2000; I've changed the input data to show this, below), makes the call simpler:

f <- function(y) {
  substr(if_else(y%%2==0, paste0(y-1,"/",substr(y,3,4)),paste0(y,"/",substr(y 1,3,4))),3,7)
} 

mutate(df, Yrsgroup = f(Yrs)

Output:

    Yrs TS Yrsgroup
1: 2000  2    99/00
2: 2011  2    11/12
3: 2012  4    11/12
4: 2012  8    11/12
5: 2013  2    13/14
6: 2013  1    13/14
7: 2014  3    13/14
8: 2014  7    13/14

CodePudding user response:

It looks like you always have a format [uneven year]/[even year]. You can check for that using modulo 2, and determine the Yrgroup using that.

Yr = c(2011,2011,2012,2012,2013,2013,2014,2014)
Tranship = c(2,5,8,2,2,2,7,8)
df = data.frame(Yr, Tranship)
df$Yrgroup <- ifelse(df$Yr %%2 == 1, 
                     yes = paste(substr(df$Yr, 3, 4), 
                                 as.numeric(substr(df$Yr, 3, 4))   1, 
                                 sep = "/"), 
                     no = paste(as.numeric(substr(df$Yr, 3, 4)) - 1, 
                                substr(df$Yr, 3, 4), 
                                sep = "/"))

df
#>     Yr Tranship Yrgroup
#> 1 2011        2   11/12
#> 2 2011        5   11/12
#> 3 2012        8   11/12
#> 4 2012        2   11/12
#> 5 2013        2   13/14
#> 6 2013        2   13/14
#> 7 2014        7   13/14
#> 8 2014        8   13/14

EDIT However, this will not work with the year 2000, as 00 - 1 = -1.

To handle this, you might want to use actual dates. lubridate is a package that is useful for handling dates.

Yr = c(1999, 2000, 2011,2011,2012,2012,2013,2013,2014,2014)
Tranship = c(8,5,2,5,8,2,2,2,7,8)
df = data.frame(Yr, Tranship)

library(lubridate)

df$Yrgroup <- ifelse(df$Yr%00%%2 == 1, 
                     paste(substr(df$Yr, 3, 4),
                           format(ymd(df$Yr*10000 101)   years(1), "%y"), 
                           sep = "/"), 
                     paste(format(ymd(df$Yr*10000 101) - years(1), "%y"),
                           substr(df$Yr, 3, 4), 
                           sep = "/"))
df
#>      Yr Tranship Yrgroup
#> 1  1999        8   99/00
#> 2  2000        5   99/00
#> 3  2011        2   11/12
#> 4  2011        5   11/12
#> 5  2012        8   11/12
#> 6  2012        2   11/12
#> 7  2013        2   13/14
#> 8  2013        2   13/14
#> 9  2014        7   13/14
#> 10 2014        8   13/14
  • Related