Home > OS >  how does databricks 'date_trunc' function run in the back end?
how does databricks 'date_trunc' function run in the back end?

Time:06-12

I was hoping to see the source code for date_trunc function in Databricks. The pyspark source code does not answer my question. basically I want to know what is happening at the very core; e.g. does it run a regexp pattern/method or does it have its own algorithm?

Can anyone help? Thank you!

CodePudding user response:

Spark code is actually JVM code even though you can use it from Python and is available on GitHub: https://github.com/apache/spark

I believe the code you are looking for is visible at https://github.com/apache/spark/blob/b6aea1a8d99b3d99e91f7f195b23169d3d61b6a7/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L971

def truncTimestamp(micros: Long, level: Int, zoneId: ZoneId): Long = {
    // Time zone offsets have a maximum precision of seconds (see `java.time.ZoneOffset`). Hence
    // truncation to microsecond, millisecond, and second can be done
    // without using time zone information. This results in a performance improvement.
    level match {
      case TRUNC_TO_MICROSECOND => micros
      case TRUNC_TO_MILLISECOND =>
        micros - Math.floorMod(micros, MICROS_PER_MILLIS)
      case TRUNC_TO_SECOND =>
        micros - Math.floorMod(micros, MICROS_PER_SECOND)
      case TRUNC_TO_MINUTE => truncToUnit(micros, zoneId, ChronoUnit.MINUTES)
      case TRUNC_TO_HOUR => truncToUnit(micros, zoneId, ChronoUnit.HOURS)
      case TRUNC_TO_DAY => truncToUnit(micros, zoneId, ChronoUnit.DAYS)
      case _ => // Try to truncate date levels
        val dDays = microsToDays(micros, zoneId)
        daysToMicros(truncDate(dDays, level), zoneId)
    }
  }
  • Related