MediaWiki (the free software behind Wikipedia) stores database timestamps in a unique binary(14) format for fields of the database. This is described further in their timestamp documentation.
The format of timestamps used in MediaWiki URLs and in some of the MediaWiki database fields is yyyymmddhhmmss. For example, the timestamp for 2023-01-20 17:12:22 (UTC) is 20230120171222. The timezone for these timestamps is UTC.
I have also seen a similar timestamp format in other places such as URLs for the Internet Archive. I am regularly needing to compare these timestamps against timestamps which are stored in a standard Unix timestamp format (seconds from the Unix epoch). I believe this should be a common format so it surprises me that I can't find a ready-made solution to easily convert from the MediaWiki format to a Unix timestamp.
What I'm most interested in is the best way to do this conversion. That is:
- Relatively short/simple to understand code.
- Most efficient algorithm.
- Does detect errors in original format.
There is apparently a function that MediaWiki includes for conversion named "wfTimestamp" however I haven't been able to locate this function itself or the source code online and I understand it has a large number of unnecessary features beyond the simple conversion. One potential solution may be to remove other parts of that function, but I still don't know if that function is the optimal solution or if there's a better way. There are lots of questions on the more general conversion to timestamps but I'm hoping for something specific to this format. I've thought of a lot of ways to solve it such as a regular expression, mktime after string split, strtotime, etc... but I'm not sure which will be fastest for this particular task/time format if it had to be done a lot of times. I am assuming since this format exists in at least two places, an optimal solution for this specific format conversion could be useful for others as well. Thanks.
CodePudding user response:
I think this is what you're probably looking:
$timestamp = strtotime("20230120171222");
// 1674234742
The Unix timestamp that this function returns does not contain information about time zones. In order to do calculations with date/time information, you should use the more capable DateTimeImmutable.
Please see here: https://www.php.net/manual/en/function.strtotime.php
CodePudding user response:
The MediaWiki software, which is the technology behind Wikipedia, uses a unique binary (14) format to store database timestamps. The format of these timestamps is yyyymmddhhmmss, for example 20230120171222 for the date 2023-01-20 17:12:22 (UTC). The timestamps are stored in UTC timezone.
When comparing these timestamps with timestamps stored in a standard Unix format (seconds from the Unix epoch), it can be challenging to find a ready-made solution to convert the MediaWiki format to Unix timestamps.
What is desired is an efficient algorithm to convert these timestamps, which is relatively short and simple to understand, and detects errors in the original format. The function "wfTimestamp" is included in MediaWiki for this conversion, but it has many unnecessary features and it is not clear if it is the optimal solution.
Potential solutions for this conversion include regular expressions, mktime after string split, strtotime, etc. However, it is not clear which solution is the most efficient for this specific format. Since this format is used in at least two places, a solution optimized for this format could be useful for others as well.
CodePudding user response:
You can use DateTime::createFromFormat function with specified format.
$date = DateTime::createFromFormat("YmdHis", "20230120171222", new \DateTimeZone('UTC'));
$timestamp = $date->getTimestamp();
I'm not sure that you can find more optimised way, because even if you will parse this manually, you have to consider that there are leap years and not every day has exactly 24 hours. PHP does it for you.