Skip to content

[WIP][SPARK-51554][SQL] Add the time_trunc() function for TIME datatype #50607

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

the-sakthi
Copy link
Member

What changes were proposed in this pull request?

  • Added a new built-in function time_trunc(unit, expr) that returns a TIME value truncated to the specified unit.
  • Allowed input for expr to be either a TIME type or a string that can be cast to TIME.
  • Supported truncation units are HOUR, MINUTE, SECOND, MILLISECOND, and MICROSECOND.
  • Handles both foldable and non-foldable inputs

Why are the changes needed?

  • Spark currently lacks a built-in function for truncating TIME values in a similar way as truncTimestamp.

Does this PR introduce any user-facing change?

Yes. A new built-in function time_trunc is added. Users can call the function to truncate TIME values to one of the above mentioned supported units.

How was this patch tested?

By running new tests: WIP. Adding new UTs.

By manual tests:

# Happy test cases
scala> spark.sql("SELECT time_trunc('HOUR', '09:32:05.123456');").show()
+---------------------------------+
|time_trunc(HOUR, 09:32:05.123456)|
+---------------------------------+
|                         09:00:00|
+---------------------------------+

scala> spark.sql("SELECT time_trunc('MINUTE', TIME'09:32:05.123456');").show()
+------------------------------------------+
|time_trunc(MINUTE, TIME '09:32:05.123456')|
+------------------------------------------+
|                                  09:32:00|
+------------------------------------------+

scala> spark.sql("SELECT time_trunc('second', '09:32:05.123456');").show()
+-----------------------------------+
|time_trunc(second, 09:32:05.123456)|
+-----------------------------------+
|                           09:32:05|
+-----------------------------------+

scala> spark.sql("SELECT time_trunc('MILLISECOND', '09:32:05.123456');").show()
+----------------------------------------+
|time_trunc(MILLISECOND, 09:32:05.123456)|
+----------------------------------------+
|                            09:32:05.123|
+----------------------------------------+

scala> spark.sql("SELECT time_trunc('MICROSECOND', '09:32:05.123456');").show()
+----------------------------------------+
|time_trunc(MICROSECOND, 09:32:05.123456)|
+----------------------------------------+
|                         09:32:05.123456|
+----------------------------------------+

scala> spark.sql("SELECT time_trunc('MICROSECOND', '09:32:05.1234');").show()
+--------------------------------------+
|time_trunc(MICROSECOND, 09:32:05.1234)|
+--------------------------------------+
|                         09:32:05.1234|
+--------------------------------------+

# Invalid inputs
scala> spark.sql("SELECT time_trunc('MS', '09:32:05.123456');").show()
+-------------------------------+
|time_trunc(MS, 09:32:05.123456)|
+-------------------------------+
|                           NULL|
+-------------------------------+

scala> spark.sql("SELECT time_trunc('MICROSECOND', '29:32:05.123456');").show()
org.apache.spark.SparkDateTimeException: [CAST_INVALID_INPUT] The value '29:32:05.123456' of the type "STRING" cannot be cast to "TIME(6)" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. SQLSTATE: 22018
== SQL (line 1, position 8) ==
SELECT time_trunc('MICROSECOND', '29:32:05.123456');

# unfoldable inputs
scala> val df = Seq(
     |   ("HOUR",       "09:32:05.123456"),
     |   ("MINUTE",     "10:20:15.123456"),
     |   ("second",     "11:59:59.999999"),
     |   ("MILLISECOND","00:00:00.123000"),
     |   ("MICROSECOND","00:00:00.123000"),
     |   ("MICROSECOND","00:00:00.123456")
     | ).toDF("unitcol", "timecol")
val df: org.apache.spark.sql.DataFrame = [unitcol: string, timecol: string]

scala> val timeDf = df.selectExpr("unitcol", "CAST(timecol AS TIME(6)) as timeval")
val timeDf: org.apache.spark.sql.DataFrame = [unitcol: string, timeval: time(6)]

scala> timeDf.createOrReplaceTempView("tmp")

scala> spark.sql("""
     |   SELECT
     |     unitcol,
     |     timeval,
     |     time_trunc(unitcol, timeval) as truncated
     |   FROM tmp
     | """).show(false)
+-----------+---------------+---------------+
|unitcol    |timeval        |truncated      |
+-----------+---------------+---------------+
|HOUR       |09:32:05.123456|09:00:00       |
|MINUTE     |10:20:15.123456|10:20:00       |
|second     |11:59:59.999999|11:59:59       |
|MILLISECOND|00:00:00.123   |00:00:00.123   |
|MICROSECOND|00:00:00.123   |00:00:00.123   |
|MICROSECOND|00:00:00.123456|00:00:00.123456|
+-----------+---------------+---------------+

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Apr 16, 2025
@the-sakthi
Copy link
Member Author

the-sakthi commented Apr 16, 2025

@MaxGekk While I am trying to convert this into a RuntimeReplaceable version and add the UTs for this, would appreciate any feedbacks from you on this, meanwhile!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant