[R]Coverting lists of numeric datas into dates in R-CodePudding

I was tidying some data with dirty dates(different styling), I tried with * lubridate * ,however I can only clean one list at a time.

So is there anyways to convert all the date in several lists? All the lists that need to be transferred end with “T”, and, in real data I have more than 3lists, probably 30 .

The cleared data need to be “yyyy-mm”, no need of days.

dirtydatatest

    > dirtydatatest
# A tibble: 6 × 5
  city  name  hbpT       tbT        hepaT     
  <chr> <chr> <chr>      <chr>      <chr>     
1 CityA Angel 2011-101   NA         20090909  
2 CityA Bob   2014-02-12 20090909   2011-127  
3 CityB Cathy 2011-10-12 2011-127   NA        
4 CityB Dean  NA         NA         2018.07.23
5 CityC Ellen 1992.10.1  2018.07.23 NA        
6 CityC Faye  2011-127   NA         NA

I used

library(tidyverse)
library(lubridate)
 dirtydatatest = mutate(dirtydatatest,
tbT= ymd(tbT) ,.keep = c("all"),)

and then using stringer to change the “-”. But can only done in one list at a time.

And I tried to use

dirtydata2 %>%
select(dirtydata2, ends_with("t")) %>%
  mutate_all(ymd() ,.keep = c("all"),)

But I don’t think I’ve got it right ,as in ymd(), there should be something in it...

Thank you for your time and patience.

CodePudding user response：

We can use across with ymd

library(dplyr)
library(lubridate)
dirtydataset %>%
    mutate(across(ends_with("T"),  ~ymd(.x, truncated = 2)))

-output

 city  name       hbpT        tbT      hepaT
1 CityA Angel 2011-10-01       <NA> 2009-09-09
2 CityA   Bob 2014-02-12 2009-09-09 2011-12-07
3 CityB Cathy 2011-10-12 2011-12-07       <NA>
4 CityB  Dean       <NA>       <NA> 2018-07-23
5 CityC Ellen 1992-10-01 2018-07-23       <NA>
6 CityC  Faye 2011-12-07       <NA>       <NA>

With mutate_at, the syntax would be

dirtydataset %>%
    mutate_at(vars(ends_with("T")),  ~ ymd(.x, truncated = 2))
   city  name       hbpT        tbT      hepaT
1 CityA Angel 2011-10-01       <NA> 2009-09-09
2 CityA   Bob 2014-02-12 2009-09-09 2011-12-07
3 CityB Cathy 2011-10-12 2011-12-07       <NA>
4 CityB  Dean       <NA>       <NA> 2018-07-23
5 CityC Ellen 1992-10-01 2018-07-23       <NA>
6 CityC  Faye 2011-12-07       <NA>       <NA>

data

dirtydataset <- structure(list(city = c("CityA", "CityA", "CityB", 
"CityB", "CityC", 
"CityC"), name = c("Angel", "Bob", "Cathy", "Dean", "Ellen", 
"Faye"), hbpT = c("2011-101", "2014-02-12", "2011-10-12", NA, 
"1992.10.1", "2011-127"), tbT = c(NA, "20090909", "2011-127", 
NA, "2018.07.23", NA), hepaT = c("20090909", "2011-127", NA, 
"2018.07.23", NA, NA)), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6"))

CodePudding user response：

As an alternative we could use parse_date function from parsedate package. Data from @r2evans (Many thanks).

parse_date can parse a date when you don’t know which format it is in. First it tries all ISO 8601 formats. Then it tries git’s versatile date parser. Lastly, it tries as.POSIXct.

library(dplyr)
library(parsedate)

dirtydataset %>% 
  mutate(across(ends_with("T"), ~parsedate::parse_date(.)))

output:

   city  name       hbpT        tbT      hepaT
1 CityA Angel 2011-04-11       <NA> 2009-09-09
2 CityA   Bob 2014-02-12 2009-09-09 2011-05-07
3 CityB Cathy 2011-10-12 2011-05-07       <NA>
4 CityB  Dean       <NA>       <NA> 2018-07-23
5 CityC Ellen 1992-10-01 2018-07-23       <NA>
6 CityC  Faye 2011-05-07       <NA>       <NA>

CodePudding user response：

First, I think across works well here to convert multiple columns to Date-class (and mutate_all and mutate_at have been superseded by it):

dirtydataset %>%
  mutate(across(ends_with("T"), ymd))
# # A tibble: 6 x 5
#   city  name  hbpT       tbT        hepaT     
#   <chr> <chr> <date>     <date>     <date>    
# 1 CityA Angel 2011-10-01 NA         2009-09-09
# 2 CityA Bob   2014-02-12 2009-09-09 2011-12-07
# 3 CityB Cathy 2011-10-12 2011-12-07 NA        
# 4 CityB Dean  NA         NA         2018-07-23
# 5 CityC Ellen 1992-10-01 2018-07-23 NA        
# 6 CityC Faye  2011-12-07 NA         NA

As to your mention of "need to be “yyyy-mm”, no need of days", then you can convert back to strings if you'd like, using format, though you lose some capabilities on it:

dirtydataset %>%
  mutate(across(ends_with("T"), ~ format(ymd(.), format = "%Y-%m"))) 
# # A tibble: 6 x 5
#   city  name  hbpT    tbT     hepaT  
#   <chr> <chr> <chr>   <chr>   <chr>  
# 1 CityA Angel 2011-10 NA      2009-09
# 2 CityA Bob   2014-02 2009-09 2011-12
# 3 CityB Cathy 2011-10 2011-12 NA     
# 4 CityB Dean  NA      NA      2018-07
# 5 CityC Ellen 1992-10 2018-07 NA     
# 6 CityC Faye  2011-12 NA      NA

Data

dirtydataset <- structure(list(city = c("CityA", "CityA", "CityB", "CityB", "CityC", "CityC"), name = c("Angel", "Bob", "Cathy", "Dean", "Ellen", "Faye"), hbpT = c("2011-101", "2014-02-12", "2011-10-12", NA, "1992.10.1", "2011-127"), tbT = c(NA, "20090909", "2011-127", NA, "2018.07.23", NA), hepaT = c("20090909", "2011-127", NA, "2018.07.23", NA, NA)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -6L))