I have a date/time string variable that looks like this:
> dput(df$starttime)
c("12/16/20 7:24", "6/21/21 13:20", "1/22/20 9:03", "1/07/20 17:19",
"11/8/21 10:14", NA, NA, "10/26/21 7:19", "3/14/22 9:48", "5/12/22 13:29"
I basically want to create a column that only has the year (2020, 2021, 2022) and the year month (e.g., "Jan 2022)
CodePudding user response:
1) Base R Assuming that you want separate month and year numeric columns, define a function which converts a string in the format shown in the question to a year or month number and then invoke it twice. No packages are used.
toNum <- function(x, fmt) format(as.Date(x, "%m/%d/%y"), fmt) |>
type.convert(as.is = TRUE)
transform(df, year = toNum(starttime, "%Y"), month = toNum(starttime, "%m"))
giving
starttime year month
1 12/16/20 7:24 2020 12
2 6/21/21 13:20 2021 6
3 1/22/20 9:03 2020 1
4 1/07/20 17:19 2020 1
5 11/8/21 10:14 2021 11
6 <NA> NA NA
7 <NA> NA NA
8 10/26/21 7:19 2021 10
9 3/14/22 9:48 2022 3
10 5/12/22 13:29 2022 5
2) yearmon Assuming that you want a yearmon class column which represents year and month internally as year fraction where fraction is 0 for Ja, 1/12 for Feb, ..., 11/12 for Dec so that it sorts appropriately and adding 1/12, say, will give the next month we can use the following. Note that if ym is yearmon then as.integer(ym) is the year and cycle(ym) is the month number (1, 2, ..., 12).
library(zoo)
transform(df, yearmon = as.yearmon(starttime, "%m/%d/%y"))
giving:
starttime yearmon
1 12/16/20 7:24 Dec 2020
2 6/21/21 13:20 Jun 2021
3 1/22/20 9:03 Jan 2020
4 1/07/20 17:19 Jan 2020
5 11/8/21 10:14 Nov 2021
6 <NA> <NA>
7 <NA> <NA>
8 10/26/21 7:19 Oct 2021
9 3/14/22 9:48 Mar 2022
10 5/12/22 13:29 May 2022
Note
If you want to sort by starttime
then use
ct <- as.POSIXct(df$starttime, format = "%m/%d/%Y %H:%M")
df[order(ct),, drop = FALSE ]
CodePudding user response:
If you want a chronologically sortable output, you could use the tsibble::yearmonth
type:
tsibble::yearmonth(lubridate::mdy_hm(c("12/16/20 7:24", "6/21/21 13:20", "1/22/20 9:03", "1/07/20 17:19",
"11/8/21 10:14", NA, NA, "10/26/21 7:19", "3/14/22 9:48", "5/12/22 13:29")))
result
<yearmonth[10]>
[1] "2020 Dec" "2021 Jun" "2020 Jan" "2020 Jan" "2021 Nov" NA NA
[8] "2021 Oct" "2022 Mar" "2022 May"
CodePudding user response:
Try this with lubridate
library(lubridate)
data.frame(df,
Year = format(mdy_hm(df$starttime), "%Y"),
MonthYear = format(mdy_hm(df$starttime), "%b %Y"))
starttime Year MonthYear
1 12/16/20 7:24 2020 Dec 2020
2 6/21/21 13:20 2021 Jun 2021
3 1/22/20 9:03 2020 Jan 2020
4 1/07/20 17:19 2020 Jan 2020
5 11/8/21 10:14 2021 Nov 2021
6 <NA> <NA> <NA>
7 <NA> <NA> <NA>
8 10/26/21 7:19 2021 Oct 2021
9 3/14/22 9:48 2022 Mar 2022
10 5/12/22 13:29 2022 May 2022
It uses mdy_hm
in conjunction with format
to get the desired Year %Y
and %b %Y
abbreviated month and year part of the date.
CodePudding user response:
An option is to convert to datetime class POSIXct
with mdy_hm
(from lubridate
), then format
to extract the month (%b
) and 4 digit year (%Y
), filter
out the NA elements and arrange
based on the converted datetime column
library(dplyr)
library(lubridate)
df %>%
mutate(starttime = mdy_hm(starttime),
yearmonth = format(starttime, "%b %Y")) %>%
filter(complete.cases(yearmonth)) %>%
arrange(starttime)
-output
# A tibble: 8 × 2
starttime yearmonth
<dttm> <chr>
1 2020-01-07 17:19:00 Jan 2020
2 2020-01-22 09:03:00 Jan 2020
3 2020-12-16 07:24:00 Dec 2020
4 2021-06-21 13:20:00 Jun 2021
5 2021-10-26 07:19:00 Oct 2021
6 2021-11-08 10:14:00 Nov 2021
7 2022-03-14 09:48:00 Mar 2022
8 2022-05-12 13:29:00 May 2022