I turned to a Stata video "Data management: How to create a date variable from a date stored as a string by Chuck Huber" to make sure my date variable were formatted properly, however, I cannot get to show me the reformatted variable (school_year2) to display as a year (e.g. 2018).
Can someone let me know what I may be missing here?
Thank you,
.do file
gen school_year2 = date(school_year,"Y")
format %ty school_year2
list school_year school_year2 in 1/10
---------------------
| school~r school~2 |
|---------------------|
1. | 2016 2.0e 04 |
2. | 2016 2.0e 04 |
3. | 2016 2.0e 04 |
4. | 2016 2.0e 04 |
5. | 2016 2.0e 04 |
|---------------------|
6. | 2016 2.0e 04 |
7. | 2016 2.0e 04 |
8. | 2016 2.0e 04 |
9. | 2016 2.0e 04 |
10. | 2016 2.0e 04 |
---------------------
. end of do-file
CodePudding user response:
The value of the underlying data is still days from 1 Jan 1960 as you are using the date()
function. So keep %td
as you are working with days here, not years. But then you can decide for it to display only the year using %tdCCYY
C
standing for century and Y
for year. But remember, the underlying data point is still the day 1 Jan 2016 and not 2016
clear
input str4 school_year
"2016"
"2016"
"2016"
"2016"
"2016"
"2016"
"2016"
"2016"
"2016"
"2016"
end
gen school_year2 = date(school_year,"Y")
format %tdCCYY school_year2
list school_year school_year2 in 1/10
If year is all you want to work with then use the year()
function to get the year from the date. The examples below details steps you can play around with.
clear
input str4 school_year
"2016"
"2016"
"2016"
"2016"
"2016"
"2016"
"2016"
"2016"
"2016"
"2016"
end
gen school_year2 = date(school_year,"Y")
gen school_year3 = year(school_year2)
format %tdCCYY school_year2
format %ty school_year3
list in 1/10
Note that in the last example, all values look the same to you. But the first variable is a string with the text "2016", the second is a date stored as the number of days from 1 Jan 1960 with only its year value displayed, and the last is a number with the number of years from year 0 displayed as a year (which in this case would have been the same had it been displayed as its underlying number).
CodePudding user response:
@TheiceBear has already explained the main point, but here is the story told a little differently in case that is helpful.
The fallacy here is that changing the (display) format is just that, a change in format. It has no effect on what is stored, that is, on the value of data held within variables in the question.
You are using generate
to create new variables, which is fine, but the basic principles can be seen directly using di
(display
) on scalar constants. That's also a good way to check understanding of Stata's rules.
The date()
function -- despite its historic name -- is for creating numeric daily dates (only). If you tell date()
that your input is a string containing the year only, then it imputes 1 January as day and month. The result is an integer, counted from the origin of the scale at 1 January 1960.
. di date("2016", "Y")
20454
. di date("1 Jan 2016", "DMY")
20454
. di date("1 Jan 1960", "DMY")
0
It is a fair bet that few are willing or able to work out what 20454 is on such a scale, but you can specify a daily date display format so that you and readers of your code can see directly.
. di %td 20454
01jan2016
There are many minor variations on that to display daily dates (or parts of them, such as monthly or yearly dates). The different format names for daily dates all start %td
.
Conversely, if you say that the value 20454 is to be displayed using a yearly format, you are referring to the year 20454, several thousand years into the future. Stata doesn't act puzzled, except that it doesn't expect such values as years and just shows you a year rounded to 20e 04, that is 20000. If you had good reason to work with dates thousands or millions of years into the future, date display formats are likely to be neither needed nor helpful.
. di %ty 20454
2.0e 04
This paper riffs on the idea that a change in display format is only that and that doesn't affect stored values.