Home > Software engineering >  labeling week dates just by the month R
labeling week dates just by the month R

Time:10-06

I have a df as follows:

x = data.frame(retailer = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2),
store = c(5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6),
week = c(2021100301, 2021092601, 2021091901, 2021091201, 2020082901, 2020082201, 2020081501, 2020080801, 2021080101, 2021072501, 2021071801,
  2021071101, 2020070401, 2020062701, 2020062001, 2020061301))

I have weeks where the values correspond to dates such as 2021100301 is 10/03/2021 and 2021071101 is 07/11/2020. I would like to add an additional column where I can label these weeks in a format where 2021100301 is 10 since it is the month of October and 2021090301 would be 09 since it is the month of septemeber and so on. I would like to do this for all the weeks in the column essentially just labeling them by month in a separate column. I am trying to get the output to be:

x = data.frame(retailer = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2),
store = c(5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6),
week = c(2021100301, 2021092601, 2021091901, 2021091201, 2020082901, 2020082201, 2020081501, 2020080801, 2021080101, 2021072501, 2021071801,
  2021071101, 2020070401, 2020062701, 2020062001, 2020061301),
month = c(10, 09, 09, 09, 08, 08, 08, 08, 08, 07, 07, 07, 07, 06, 06 ,06))


Is there a way I can do this? Thanks.

CodePudding user response:

Try with substr

library(dplyr)
x %>% 
   mutate(month = as.numeric(substr(week, 5, 6)))

-output

 retailer store       week month
1         2     5 2021100301    10
2         2     5 2021092601     9
3         2     5 2021091901     9
4         2     5 2021091201     9
5         2     5 2020082901     8
6         2     5 2020082201     8
7         2     5 2020081501     8
8         2     5 2020080801     8
9         2     6 2021080101     8
10        2     6 2021072501     7
11        2     6 2021071801     7
12        2     6 2021071101     7
13        2     6 2020070401     7
14        2     6 2020062701     6
15        2     6 2020062001     6
16        2     6 2020061301     6

CodePudding user response:

Here is an alternative approach: Logic:

  1. Remove the last two digits
  2. Convert to class date
  3. get month
library(dplyr)
library(lubridate)
x %>% 
  mutate(month1 = gsub('.{2}$', '', week),
         month1 = month(ymd(month1)))
   retailer store       week month1
1         2     5 2021100301     10
2         2     5 2021092601      9
3         2     5 2021091901      9
4         2     5 2021091201      9
5         2     5 2020082901      8
6         2     5 2020082201      8
7         2     5 2020081501      8
8         2     5 2020080801      8
9         2     6 2021080101      8
10        2     6 2021072501      7
11        2     6 2021071801      7
12        2     6 2021071101      7
13        2     6 2020070401      7
14        2     6 2020062701      6
15        2     6 2020062001      6
16        2     6 2020061301      6
  • Related