Home > database >  how to create a new date (month, year) data in R
how to create a new date (month, year) data in R

Time:12-03

I have a very simple question and I hope you can help me. I have a dataset with monthly temperatures from 1958 to 2020. This gives me a total of 756 observations, which matches with the amount of months. This is the only column I have, and I would like to add a column with the date in format month-year, starting from 01-1958 in the first observation, following 02-1958, 03-1958...... 12-2020.

Any ideas?

Thank you very much!

CodePudding user response:

Two things:

  1. I think a Date object would be much better (there is no Month object), since it has natural number-like properties that allows you to find differences, plot without bias, etc. Note that stored this way, every other representation can be derived trivially for reports/renders.

  2. Even if you must go with a string, I suggest putting year first so that sorting works as expected.

You offered no data, so I'll make something up:

mydata <- data.frame(val = 1:756)
mydata$date <- seq(as.Date("1958-01-01"), length.out=756, by="month")
mydata$ym_chr <- format(mydata$date, format = "%Y-%m")
mydata$my_chr <- format(mydata$date, format = "%m-%Y")
mydata[c(1:5, 752:756),]
#     val       date  ym_chr  my_chr
# 1     1 1958-01-01 1958-01 01-1958
# 2     2 1958-02-01 1958-02 02-1958
# 3     3 1958-03-01 1958-03 03-1958
# 4     4 1958-04-01 1958-04 04-1958
# 5     5 1958-05-01 1958-05 05-1958
# 752 752 2020-08-01 2020-08 08-2020
# 753 753 2020-09-01 2020-09 09-2020
# 754 754 2020-10-01 2020-10 10-2020
# 755 755 2020-11-01 2020-11 11-2020
# 756 756 2020-12-01 2020-12 12-2020

As a quick demonstrating that we are looking at exactly (no more, no fewer) than one month per year, all months, all years, here's a quick table:

table(year=gsub(".*-", "", mydata$my_chr), month=gsub("-.*", "", mydata$my_chr))
#       month
# year   01 02 03 04 05 06 07 08 09 10 11 12
#   1958  1  1  1  1  1  1  1  1  1  1  1  1
#   1959  1  1  1  1  1  1  1  1  1  1  1  1
#   1960  1  1  1  1  1  1  1  1  1  1  1  1
# ...
#   2018  1  1  1  1  1  1  1  1  1  1  1  1
#   2019  1  1  1  1  1  1  1  1  1  1  1  1
#   2020  1  1  1  1  1  1  1  1  1  1  1  1

All snipped rows are identical in all but the year, i.e., all 1s. The sum(.) of this is 756. (Just checking since I wanted to make sure I was doing it right.)

Lastly, to highlight my comment about sorting, here are some examples premised on the knowledge that val is incrementing from 1.

head(mydata[order(mydata$ym_chr),])
#   val       date  ym_chr  my_chr
# 1   1 1958-01-01 1958-01 01-1958
# 2   2 1958-02-01 1958-02 02-1958
# 3   3 1958-03-01 1958-03 03-1958
# 4   4 1958-04-01 1958-04 04-1958
# 5   5 1958-05-01 1958-05 05-1958
# 6   6 1958-06-01 1958-06 06-1958

head(mydata[order(mydata$my_chr),])
#    val       date  ym_chr  my_chr
# 1    1 1958-01-01 1958-01 01-1958
# 13  13 1959-01-01 1959-01 01-1959
# 25  25 1960-01-01 1960-01 01-1960
# 37  37 1961-01-01 1961-01 01-1961
# 49  49 1962-01-01 1962-01 01-1962
# 61  61 1963-01-01 1963-01 01-1963

If being able to sort by date is important, than I suggest it will be much simpler to use either $date or the string $ym_chr.

  • Related