[WIP][SPARK-51554][SQL] Add the time_trunc() function for TIME datatype #50607

the-sakthi · 2025-04-16T11:00:13Z

What changes were proposed in this pull request?

Added a new built-in function time_trunc(unit, expr) that returns a TIME value truncated to the specified unit.
Allowed input for expr to be either a TIME type or a string that can be cast to TIME.
Supported truncation units are HOUR, MINUTE, SECOND, MILLISECOND, and MICROSECOND.
Handles both foldable and non-foldable inputs

Why are the changes needed?

Spark currently lacks a built-in function for truncating TIME values in a similar way as truncTimestamp.

Does this PR introduce any user-facing change?

Yes. A new built-in function time_trunc is added. Users can call the function to truncate TIME values to one of the above mentioned supported units.

How was this patch tested?

By running new tests: WIP. Adding new UTs.

By manual tests:

# Happy test cases
scala> spark.sql("SELECT time_trunc('HOUR', '09:32:05.123456');").show()
+---------------------------------+
|time_trunc(HOUR, 09:32:05.123456)|
+---------------------------------+
|                         09:00:00|
+---------------------------------+

scala> spark.sql("SELECT time_trunc('MINUTE', TIME'09:32:05.123456');").show()
+------------------------------------------+
|time_trunc(MINUTE, TIME '09:32:05.123456')|
+------------------------------------------+
|                                  09:32:00|
+------------------------------------------+

scala> spark.sql("SELECT time_trunc('second', '09:32:05.123456');").show()
+-----------------------------------+
|time_trunc(second, 09:32:05.123456)|
+-----------------------------------+
|                           09:32:05|
+-----------------------------------+

scala> spark.sql("SELECT time_trunc('MILLISECOND', '09:32:05.123456');").show()
+----------------------------------------+
|time_trunc(MILLISECOND, 09:32:05.123456)|
+----------------------------------------+
|                            09:32:05.123|
+----------------------------------------+

scala> spark.sql("SELECT time_trunc('MICROSECOND', '09:32:05.123456');").show()
+----------------------------------------+
|time_trunc(MICROSECOND, 09:32:05.123456)|
+----------------------------------------+
|                         09:32:05.123456|
+----------------------------------------+

scala> spark.sql("SELECT time_trunc('MICROSECOND', '09:32:05.1234');").show()
+--------------------------------------+
|time_trunc(MICROSECOND, 09:32:05.1234)|
+--------------------------------------+
|                         09:32:05.1234|
+--------------------------------------+

# Invalid inputs
scala> spark.sql("SELECT time_trunc('MS', '09:32:05.123456');").show()
+-------------------------------+
|time_trunc(MS, 09:32:05.123456)|
+-------------------------------+
|                           NULL|
+-------------------------------+

scala> spark.sql("SELECT time_trunc('MICROSECOND', '29:32:05.123456');").show()
org.apache.spark.SparkDateTimeException: [CAST_INVALID_INPUT] The value '29:32:05.123456' of the type "STRING" cannot be cast to "TIME(6)" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. SQLSTATE: 22018
== SQL (line 1, position 8) ==
SELECT time_trunc('MICROSECOND', '29:32:05.123456');

# unfoldable inputs
scala> val df = Seq(
     |   ("HOUR",       "09:32:05.123456"),
     |   ("MINUTE",     "10:20:15.123456"),
     |   ("second",     "11:59:59.999999"),
     |   ("MILLISECOND","00:00:00.123000"),
     |   ("MICROSECOND","00:00:00.123000"),
     |   ("MICROSECOND","00:00:00.123456")
     | ).toDF("unitcol", "timecol")
val df: org.apache.spark.sql.DataFrame = [unitcol: string, timecol: string]

scala> val timeDf = df.selectExpr("unitcol", "CAST(timecol AS TIME(6)) as timeval")
val timeDf: org.apache.spark.sql.DataFrame = [unitcol: string, timeval: time(6)]

scala> timeDf.createOrReplaceTempView("tmp")

scala> spark.sql("""
     |   SELECT
     |     unitcol,
     |     timeval,
     |     time_trunc(unitcol, timeval) as truncated
     |   FROM tmp
     | """).show(false)
+-----------+---------------+---------------+
|unitcol    |timeval        |truncated      |
+-----------+---------------+---------------+
|HOUR       |09:32:05.123456|09:00:00       |
|MINUTE     |10:20:15.123456|10:20:00       |
|second     |11:59:59.999999|11:59:59       |
|MILLISECOND|00:00:00.123   |00:00:00.123   |
|MICROSECOND|00:00:00.123   |00:00:00.123   |
|MICROSECOND|00:00:00.123456|00:00:00.123456|
+-----------+---------------+---------------+

Was this patch authored or co-authored using generative AI tooling?

No.

the-sakthi · 2025-04-16T11:02:36Z

@MaxGekk While I am trying to convert this into a RuntimeReplaceable version and add the UTs for this, would appreciate any feedbacks from you on this, meanwhile!

[SPARK-51554][SQL] Add the time_trunc() function for TIME datatype

a680c53

github-actions bot added the SQL label Apr 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP][SPARK-51554][SQL] Add the time_trunc() function for TIME datatype #50607

[WIP][SPARK-51554][SQL] Add the time_trunc() function for TIME datatype #50607

the-sakthi commented Apr 16, 2025

the-sakthi commented Apr 16, 2025 •

edited

Loading

[WIP][SPARK-51554][SQL] Add the time_trunc() function for TIME datatype #50607

Are you sure you want to change the base?

[WIP][SPARK-51554][SQL] Add the time_trunc() function for TIME datatype #50607

Conversation

the-sakthi commented Apr 16, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

the-sakthi commented Apr 16, 2025 • edited Loading

the-sakthi commented Apr 16, 2025 •

edited

Loading