Home > other >  How do I convert Stata dates (%td format e.g. 30jan2015) into YYYYMMDD format (e.g. 20150130)
How do I convert Stata dates (%td format e.g. 30jan2015) into YYYYMMDD format (e.g. 20150130)

Time:08-16

* date is in %td format

gen date1 = real(string(mofd(daily(date, "DMY")), "%tmCYN"))
* type mismatch error

tostring date, gen(dt)
gen date1 = real(string(mofd(daily(dt, "DMY")), "%tmCYN"))
* the code runs but generates no results

tostring date, gen(dt)
gen date2=date(dt, "YMD")
* the code runs but generates no results

CodePudding user response:

If a date variable has a display format %td it must be numeric and stored as some kind of integer. The display format is, and is only, an instruction to Stata on how to display such integers. Confusions about conversion often seem to hinge on a misunderstanding about what format means, as format is an overloaded word in computing, referring variously to file format (as in graphics file format, .png or jpg or whatever); data layout (as in wide or long layout, structure or format); variable or storage type; and (here) display format. There could well be yet other meanings.

A date displayed as 30jan2015 is stored as an integer, namely

. display mdy(1, 30, 2015)
20118

and a glance at help data types shows that your variable date could be stored as an int, float, long or double. All would work, although int is least demanding of memory. You would need (e.g.) to run describe date to find out which type is being used in your case, but nothing to come in this answer depends on knowing that type. Note that finding out what Stata is doing and thinking can be illuminated by running display with simple, single examples.

Your question is ambiguous.

Want to change display format? If you wish merely to see your dates in a display format exemplified by 20150130 then consulting help datetime display formats shows that the display format is as tested here with display, which can be abbreviated all the way down to di

. di %tdCCYYNNDD 20118
20150130

so

format date %tdCCYYNNDD

is what you need. That instructs Stata to change the display format, but the numbers stored remain precisely as they were.

Want such dates as variables held as integers? If you want the dates to be held as integers like 20150130 then you could convert it to string using the display format above, and then to a real value. A minimal sandbox dataset shows this:

. clear

. set obs 1
Number of observations (_N) was 0, now 1.

. gen date = 20118

. gen wanted = real(strofreal(date, "%tdCCYYNNDD"))

. format wanted  %8.0f

. l

      ------------------ 
     |  date     wanted |
     |------------------|
  1. | 20118   20150130 |
      ------------------ 

A display format such as %8.0f is needed to see such values directly.

Another method is to generate a large integer directly. You need to be explicit about a suitable storage type and (as just mentioned) need to set an appropriate format, but it can be got to work:

. gen long also = 10000 * year(date)   100 * month(date)   day(date)

. format also %8.0f 

Want such dates as variables held as strings? This is the previous solution, but leave off the real(). The default display format will work fine.

. gen WANTED = strofreal(date, "%tdCCYYNNDD")

. l

      ----------------------------- 
     |  date     wanted     WANTED |
     |-----------------------------|
  1. | 20118   20150130   20150130 |
      ----------------------------- 

I have not used tostring here but as its original author I have no bias against it. The principles needed here are better illustrated using the underlying function strofreal(). The older name string() will still work.

Turning to your code,

tostring date, gen(dt)

will just put integers like 20118 in string form, so "20118", but there is no way that Stata can understand that alone to be a daily date. You could have run tostring with a format argument, which would have been equivalent to the code above. The advantage of tostring would only be if you had several such variables you wished to convert at once, as tostring would loop over such variables for you.

I can't follow why you thought that conversion to a monthly date or use of a monthly date display format was needed or helpful, as at best you'd lose the information on day of the month. Thus at best Stata can only map a monthly date back to the first day of that month, and at worst a monthly date (here 660) could not be understood as anything you want.

. di mofd(20118)
660

. di %td mofd(20118)
22oct1961

. di %td dofm(mofd(20118))
01jan2015

There is no shortcut to understanding how Stata thinks about dates that doesn't involve reading the needed parts of help datetime and help datetime display formats.

Yet more explanation and examples can be found at https://www.stata-journal.com/article.html?article=dm0067

  • Related