Parsing a bi-annual time variable-CodePudding

I wanted to parse a bi-annual time variable from a dataframe when reading it in using read_csv.

The bi-annual time variable has the format Year-B1 and Year-B2. Here is a little example of how my data is looking like and what I want to do.

#example
#creating some tibble to demonstrate how my data looks like: 

library(tidyverse)

tb <- tibble (time = c("2021-B1","2021-B2","2022-B1","2022-B2"),
              country = rep("USA",4),
              value = c(3,5,4,7)
)
> tb
# A tibble: 4 × 3
  time    country value
  <chr>   <chr>   <dbl>
1 2021-B1 USA         3
2 2021-B2 USA         5
3 2022-B1 USA         4
4 2022-B2 USA         7

#saving the tibble as csv file 

write_csv(tb, "tb.csv")

Now my plan is to read the csv file and specify columns; but I don't know how to specify the col_date()

tb_r <- read_csv("tb.csv", col_types = cols( time= col_date(format= "%Y-%B"), 
                                               country = col_character(), 
                                               value = col_double() ))

Ideally I would like to have the time variable be displayed for Year-B1 as Year:Jun and for Year-B2 as Year:Dec.

CodePudding user response：

We can replace :B1 and :B2 with your desired months.

tb %>% 
  mutate(time_stamp = str_replace_all(time, c(":B1" = ":Jun", ":B2" = ":Dec"))) 

# # A tibble: 4 x 4
#   time    country value time_stamp
#   <chr>   <chr>   <dbl> <chr>     
# 1 2021:B1 USA         3 2021:Jun  
# 2 2021:B2 USA         5 2021:Dec  
# 3 2022:B1 USA         4 2022:Jun  
# 4 2022:B2 USA         7 2022:Dec

There is no built-in format for the way your data is structured. So you need to post-process it. It can be done in one pipeline though;

tb_r <- read_csv("tb.csv", col_types = cols( time= col_character(), 
                                               country = col_character(), 
                                               value = col_double() )) %>%
          mutate(time = str_replace_all(time, c(":B1" = ":Jun", ":B2" = ":Dec")))