Databases often store a datetime without timezone as a separate type as a datetime with a timezone. As an example, I'll use BigQuery (though most databases store this the same):
DATETIME
is a date/time that does not store timezone.TIMESTAMP
is a date/time that does store timezone.
I understand abstractly that "Dec 2 at 2:45 PM" is a different time in Japan as it is in New York, but I'm wondering why that even matters if all the dates are stored in UTC by the application. For example, if the value to be inserted is:
2021-12-02 14:45:00
Wouldn't that value be inserted as 2021-12-02 14:45:00 UTC
in both data types? Or, would the 2:45PM be stored as "2:45 PM UTC" in the DATETIME type but would be stored as (if using EST) 2:45 PM EST --> 6:45 PM UTC in the TIMESTAMP type?
And if the value was:
2021-12-02 14:45:00 EST
Wouldn't that value also be inserted as 2021-12-02 18:45:00 UTC
in both data types, and stored the same way? It seems to only 'timezone' is on the query-side, and can't that act either as a cursor-variable or some sort of metadata on the field (similar to a NULL
check)? I guess I'm not following why the timezone-aware and no-timezone need to be stored as two different types if all date/times get stored as UTC anyways.
CodePudding user response:
The SQL standard defines two types for a date with time-of-day:
TIMESTAMP
(Also known more clearly in some databases such as Postgres asTIMESTAMP WITHOUT TIME ZONE
)TIMESTAMP WITH TIME ZONE
The first type is meant to purposely lack any context of time zone or offset-from-UTC. So noon on the 23rd of January next year, 2022-01-23 12:00, means noon anywhere and everywhere. It means noon on Tokyo Japan as well as noon on Toulouse France, and also noon in Toledo Ohio US. These are all obviously different moments, several hours apart. Therefore, this type cannot represent a moment, is not a specific point on the time line.
The second type does represent a moment, is a specific point on ne timeline. When you want to track actual moments, such as when a row was written to the database, or when a shipment arrived in a warehouse, use this type.
Unfortunately, the SQL spec says little about the various date-time types and behaviors. So the various database products vary widely in their support for these types and their interpretations of behavior.
In some databases such as Postgres, a value submitted to a column of the fist type (TIMESTAMP WITHOUT TIME ZONE
) containing an indicator of zone or offset will have the date and time recorded as submitted. No adjustment is made. Any zone or offset input is ignored and discarded.
In some databases such as Postgres, a value submitted to a column of the second type (TIMESTAMP WITH TIME ZONE
) containing an indicator of zone or offset will have its date and time adjusted to UTC before being written to the database. In such databases, this type is always in UTC, that is, represents a moment as seen with an offset of zero.
What is an offset? Merely a number of hours-minutes-seconds ahead of UTC (
) or behind UTC (-
). A time zone, in contrast, is much more. A time zone has a name in Continent/Region
format, and contains the history past, present, and future changes to the offset used by the people of a particular region as decided by their politicians.
So the TIMESTAMP WITH TIME ZONE
type in databases such as Postgres is a misnomer. No time zone information is stored in the database. Any time zone or offset info submitted with the date and time is used to adjust to UTC. The zone/offset info is then discarded. So if remembering the zone originally submitted is important to you, you will need to store that yourself in a second column. Regarding the misnomer, you can think of the type as being TIMESTAMP WITH REGARD FOR SUBMITTED OFFSET OR TIME ZONE
. But be clear, in databases such as Postgres your moment is stored in UTC, always UTC, and is retrieved as UTC, always UTC.
Unfortunately, there is a wrinkle here. All too commonly, tooling and middleware will inject a default time zone, adjusting the retrieved UTC moment into some time zone. While well-intentioned, this anti-feature creates the illusion that the value was stored with that time zone. But the values actually stored in UTC, at least with databases such as Postgres.
You asked:
2021-12-02 14:45:00 Wouldn't that value be inserted as 2021-12-02 14:45:00 UTC in both data types?
No.
- In a column of a data type akin to the
TIMESTAMP WITHOUT TIME ZONE
, that date and time will be stored as submitted, a quarter to 3 PM on the second of December this year. - In a column of a data type akin to
TIMESTAMP WITH TIME ZONE
, the value stored may depend on the behavior of your particular database and your particular middleware, tooling, and driver. The behavior might simply assume you meant 2021-12-02 14:45:00 as seen in UTC, and store that. Or the behavior may assume you meant 2021-12-02 14:45:00 as seen in a particular time zone. And in databases such as Postgres, an adjustment to UTC would be applied before finally storing. You must study the documentation of your particular database, middleware, tooling, and driver to discover which behavior will be seen in your software. Be sure to conduct experiments to verify your understanding.
You asked:
2021-12-02 14:45:00 … Or, would the 2:45PM be stored as "2:45 PM UTC" in the DATETIME type but would be stored as (if using EST) 2:45 PM EST --> 6:45 PM UTC in the TIMESTAMP type?
”Likely yes”, for the first clause. But no EST
involved at all. The date is stored as-is, 2021-12-02, along with the time-of-day as-is, 14:45:00. The EST
part is ignored and discarded. (But experiment to verify this behavior in your particular tooling.)
And “maybe” for the second clause. As discussed in last bullet above, the behavior for TIMESTAMP WITH TIME ZONE
may vary. Read docs, and conduct experiments.
You said:
though most databases store this the same
No, incorrect. That would be a very big “No”.
Databases vary widely in their support for date-time features, their kinds of date-time types, the names of their types, the technical details of their types, and the behaviors of the database server, middleware, drivers, and tooling.
Some older database systems have legacy data types supplanted by newer types, but all still supported, further complicating the picture.
You said:
I guess I'm not following why the timezone-aware and no-timezone need to be stored as two different types if all date/times get stored as UTC anyways.
You incorrectly assume that the “no-timezone” type stores in UTC. It does not.
That is what the “no-timezone” means: no regard for an offset or zone, no consideration for any offset or zone, no adjustment for any offset or zone, and no concept of offset or zone. The TIMESTAMP WITHOUT TIME ZONE
type means simply, literally, a date, a time-of-day, and nothing more. Anything more than that is either (a) a figment of your imagination, or (b) interference by your middleware/tooling/drivers.