how does databricks 'date_trunc' function run in the back end?-CodePudding

I was hoping to see the source code for date_trunc function in Databricks. The pyspark source code does not answer my question. basically I want to know what is happening at the very core; e.g. does it run a regexp pattern/method or does it have its own algorithm?

Can anyone help? Thank you!

CodePudding user response：

Spark code is actually JVM code even though you can use it from Python and is available on GitHub: https://github.com/apache/spark

I believe the code you are looking for is visible at https://github.com/apache/spark/blob/b6aea1a8d99b3d99e91f7f195b23169d3d61b6a7/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L971

def truncTimestamp(micros: Long, level: Int, zoneId: ZoneId): Long = {
    // Time zone offsets have a maximum precision of seconds (see `java.time.ZoneOffset`). Hence
    // truncation to microsecond, millisecond, and second can be done
    // without using time zone information. This results in a performance improvement.
    level match {
      case TRUNC_TO_MICROSECOND => micros
      case TRUNC_TO_MILLISECOND =>
        micros - Math.floorMod(micros, MICROS_PER_MILLIS)
      case TRUNC_TO_SECOND =>
        micros - Math.floorMod(micros, MICROS_PER_SECOND)
      case TRUNC_TO_MINUTE => truncToUnit(micros, zoneId, ChronoUnit.MINUTES)
      case TRUNC_TO_HOUR => truncToUnit(micros, zoneId, ChronoUnit.HOURS)
      case TRUNC_TO_DAY => truncToUnit(micros, zoneId, ChronoUnit.DAYS)
      case _ => // Try to truncate date levels
        val dDays = microsToDays(micros, zoneId)
        daysToMicros(truncDate(dDays, level), zoneId)
    }
  }