I wanted to parse a bi-annual time variable from a dataframe when reading it in using read_csv
.
The bi-annual time variable has the format Year-B1
and Year-B2
. Here is a little example of how my data is looking like and what I want to do.
#example
#creating some tibble to demonstrate how my data looks like:
library(tidyverse)
tb <- tibble (time = c("2021-B1","2021-B2","2022-B1","2022-B2"),
country = rep("USA",4),
value = c(3,5,4,7)
)
> tb
# A tibble: 4 × 3
time country value
<chr> <chr> <dbl>
1 2021-B1 USA 3
2 2021-B2 USA 5
3 2022-B1 USA 4
4 2022-B2 USA 7
#saving the tibble as csv file
write_csv(tb, "tb.csv")
Now my plan is to read the csv file and specify columns; but I don't know how to specify the col_date()
tb_r <- read_csv("tb.csv", col_types = cols( time= col_date(format= "%Y-%B"),
country = col_character(),
value = col_double() ))
Ideally I would like to have the time variable be displayed for Year-B1 as Year:Jun and for Year-B2 as Year:Dec.
CodePudding user response:
We can replace :B1
and :B2
with your desired months.
tb %>%
mutate(time_stamp = str_replace_all(time, c(":B1" = ":Jun", ":B2" = ":Dec")))
# # A tibble: 4 x 4
# time country value time_stamp
# <chr> <chr> <dbl> <chr>
# 1 2021:B1 USA 3 2021:Jun
# 2 2021:B2 USA 5 2021:Dec
# 3 2022:B1 USA 4 2022:Jun
# 4 2022:B2 USA 7 2022:Dec
There is no built-in format for the way your data is structured. So you need to post-process it. It can be done in one pipeline though;
tb_r <- read_csv("tb.csv", col_types = cols( time= col_character(),
country = col_character(),
value = col_double() )) %>%
mutate(time = str_replace_all(time, c(":B1" = ":Jun", ":B2" = ":Dec")))