Home > Back-end >  spark - how to extract hour from timestamp?
spark - how to extract hour from timestamp?

Time:02-20

Please help understand why the date_format does not extract 08:15 for 8:15am?

spark.sql("select date_format(date '1994-11-05T08:15:30-05:00', 'hh:mm') AS hour_in_day_number").show()

 ------------------ 
|hour_in_day_number|
 ------------------ 
|             12:00|
 ------------------ 
spark.sql("select date_format(date '1994-11-05T08:15:30-05:00', 'HH:mm') AS hour_in_day_number").show()

 ------------------ 
|hour_in_day_number|
 ------------------ 
|             00:00|
 ------------------ 
spark.sql("select date_format(date '1994-11-05T08:15:30-05:00', 'kk:mm') AS hour_in_day_number").show()

 ------------------ 
|hour_in_day_number|
 ------------------ 
|             24:00|
 ------------------ 

W3C Date and Time Formats

Examples
1994-11-05T08:15:30-05:00 corresponds to November 5, 1994, 8:15:30 am, US Eastern Standard Time.

Datetime Patterns for Formatting and Parsing

Symbol Meaning Presentation Examples
G era text AD; Anno Domini
y year year 2020; 20
D day-of-year number(3) 189
M/L month-of-year month 7; 07; Jul; July
d day-of-month number(3) 28
Q/q quarter-of-year number/text 3; 03; Q3; 3rd quarter
E day-of-week text Tue; Tuesday
F aligned day of week in month number(1) 3
a am-pm-of-day am-pm PM
h clock-hour-of-am-pm (1-12) number(2) 12
K hour-of-am-pm (0-11) number(2) 0
k clock-hour-of-day (1-24) number(2) 0
H hour-of-day (0-23) number(2) 0
m minute-of-hour number(2) 30
s second-of-minute number(2) 55
S fraction-of-second fraction 978
V time-zone ID zone-id America/Los_Angeles; Z; -08:30
z time-zone name zone-name Pacific Standard Time; PST
O localized zone-offset offset-O GMT 8; GMT 08:00; UTC-08:00;
X zone-offset ‘Z’ for zero offset-X Z; -08; -0830; -08:30; -083015; -08:30:15;
x zone-offset offset-x 0000; -08; -0830; -08:30; -083015; -08:30:15;
Z zone-offset offset-Z 0000; -0800; -08:00;

Environment

$ spark-submit --version
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.1.2
      /_/
                        
Using Scala version 2.12.10, OpenJDK 64-Bit Server VM, 1.8.0_312
Branch HEAD
Compiled by user centos on 2021-05-24T04:27:48Z
Revision de351e30a90dd988b133b3d00fa6218bfcaba8b8
Url https://github.com/apache/spark
Type --help for more information.

CodePudding user response:

You used date, which only keep year, month and day.

You can try use tiemstamp as below:

scala> spark.sql("select date_format(timestamp '1994-11-05T08:15:30-05:00', 'hh:mm') AS hour_in_day_number").show()
 ------------------ 
|hour_in_day_number|
 ------------------ 
|             05:15|
 ------------------ 

You can refer https://databricks.com/blog/2020/07/22/a-comprehensive-look-at-dates-and-timestamps-in-apache-spark-3-0.html

  • Related