Home > Net >  How can I create a new column that only extracts either the year or month from a mm/dd/yy hh:mm stri
How can I create a new column that only extracts either the year or month from a mm/dd/yy hh:mm stri

Time:11-30

I have a date/time string variable that looks like this:

> dput(df$starttime)
c("12/16/20 7:24", "6/21/21 13:20", "1/22/20 9:03", "1/07/20 17:19", 
"11/8/21 10:14", NA, NA, "10/26/21 7:19", "3/14/22 9:48", "5/12/22 13:29"

I basically want to create a column that only has the year (2020, 2021, 2022) and the year month (e.g., "Jan 2022)

CodePudding user response:

1) Base R Assuming that you want separate month and year numeric columns, define a function which converts a string in the format shown in the question to a year or month number and then invoke it twice. No packages are used.

toNum <- function(x, fmt) format(as.Date(x, "%m/%d/%y"), fmt) |>
  type.convert(as.is = TRUE)
transform(df, year = toNum(starttime, "%Y"), month = toNum(starttime, "%m"))

giving

       starttime year month
1  12/16/20 7:24 2020    12
2  6/21/21 13:20 2021     6
3   1/22/20 9:03 2020     1
4  1/07/20 17:19 2020     1
5  11/8/21 10:14 2021    11
6           <NA>   NA    NA
7           <NA>   NA    NA
8  10/26/21 7:19 2021    10
9   3/14/22 9:48 2022     3
10 5/12/22 13:29 2022     5

2) yearmon Assuming that you want a yearmon class column which represents year and month internally as year fraction where fraction is 0 for Ja, 1/12 for Feb, ..., 11/12 for Dec so that it sorts appropriately and adding 1/12, say, will give the next month we can use the following. Note that if ym is yearmon then as.integer(ym) is the year and cycle(ym) is the month number (1, 2, ..., 12).

library(zoo)
transform(df, yearmon = as.yearmon(starttime, "%m/%d/%y"))

giving:

       starttime  yearmon
1  12/16/20 7:24 Dec 2020
2  6/21/21 13:20 Jun 2021
3   1/22/20 9:03 Jan 2020
4  1/07/20 17:19 Jan 2020
5  11/8/21 10:14 Nov 2021
6           <NA>     <NA>
7           <NA>     <NA>
8  10/26/21 7:19 Oct 2021
9   3/14/22 9:48 Mar 2022
10 5/12/22 13:29 May 2022

Note

If you want to sort by starttime then use

ct <- as.POSIXct(df$starttime, format = "%m/%d/%Y %H:%M")
df[order(ct),, drop = FALSE ]

CodePudding user response:

If you want a chronologically sortable output, you could use the tsibble::yearmonth type:

tsibble::yearmonth(lubridate::mdy_hm(c("12/16/20 7:24", "6/21/21 13:20", "1/22/20 9:03", "1/07/20 17:19", 
  "11/8/21 10:14", NA, NA, "10/26/21 7:19", "3/14/22 9:48", "5/12/22 13:29")))

result

<yearmonth[10]>
 [1] "2020 Dec" "2021 Jun" "2020 Jan" "2020 Jan" "2021 Nov" NA         NA        
 [8] "2021 Oct" "2022 Mar" "2022 May"

CodePudding user response:

Try this with lubridate

library(lubridate)

data.frame(df, 
  Year = format(mdy_hm(df$starttime), "%Y"), 
  MonthYear = format(mdy_hm(df$starttime), "%b %Y"))
       starttime Year MonthYear
1  12/16/20 7:24 2020  Dec 2020
2  6/21/21 13:20 2021  Jun 2021
3   1/22/20 9:03 2020  Jan 2020
4  1/07/20 17:19 2020  Jan 2020
5  11/8/21 10:14 2021  Nov 2021
6           <NA> <NA>      <NA>
7           <NA> <NA>      <NA>
8  10/26/21 7:19 2021  Oct 2021
9   3/14/22 9:48 2022  Mar 2022
10 5/12/22 13:29 2022  May 2022

It uses mdy_hm in conjunction with format to get the desired Year %Y and %b %Y abbreviated month and year part of the date.

CodePudding user response:

An option is to convert to datetime class POSIXct with mdy_hm (from lubridate), then format to extract the month (%b) and 4 digit year (%Y), filter out the NA elements and arrange based on the converted datetime column

library(dplyr)
library(lubridate)
df %>% 
   mutate(starttime = mdy_hm(starttime),
         yearmonth = format(starttime, "%b %Y")) %>%
   filter(complete.cases(yearmonth)) %>%
   arrange(starttime)   

-output

# A tibble: 8 × 2
  starttime           yearmonth
  <dttm>              <chr>    
1 2020-01-07 17:19:00 Jan 2020 
2 2020-01-22 09:03:00 Jan 2020 
3 2020-12-16 07:24:00 Dec 2020 
4 2021-06-21 13:20:00 Jun 2021 
5 2021-10-26 07:19:00 Oct 2021 
6 2021-11-08 10:14:00 Nov 2021 
7 2022-03-14 09:48:00 Mar 2022 
8 2022-05-12 13:29:00 May 2022 
  •  Tags:  
  • r
  • Related