Unix timestamp granularity changed to hours instead of milliseconds-CodePudding

I have a Spark data frame with the column timestamp. I need to create event_hour in unix_timestamp format out of this column. The current issue is that the timestamp is in unix_timestamp format with a granularity of milliseconds while I need the granularity of hours.

Current values for timestamp:
1653192037
1653192026
1653192025
1653192024
1653192023
1653192022

Expected values:
1653192000
1653195600
1653199200
1653202800

How can I achieve that using Spark functions? I've already tried to convert it to timestamp and then format it but I got null as the result:

inputDf
.withColumn("event_hour", unix_timestamp(date_format($"timestamp".cast(TimestampType), "MM-dd-yyyy HH")))

CodePudding user response：

A (not very explicit but) efficient way would be to use modulus operation with 3600 (as 3600 seconds = 1 hour):

timestamp_hour = timestamp_second - (timestamp_second % 3600)

This assumes you are manipulating data as numeric.

CodePudding user response：

You can use DateUtils API,

import org.apache.commons.lang3.time.DateUtils;

Long epochTimestamp_hour =  DateUtils.truncate(Timestamp_column, Calendar.HOUR)).getTime();

create new column of type timestamp
use that column to truncate timestamp to epochTimestamp_hour