DataFusion Comet 0.15.0 Changelog#

This release consists of 142 commits from 19 contributors. See credits at the end of this changelog for more information.

Fixed bugs:

  • fix: enable native_datafusion Spark SQL tests previously ignored in #3315 #3696 (andygrove)

  • fix: route file-not-found errors through SparkError JSON path #3699 (andygrove)

  • fix: fall back from native_datafusion for duplicate fields in case-insensitive mode #3687 (andygrove)

  • fix: enable more Spark SQL tests for native_datafusion (DynamicPartitionPruningSuite / ExplainSuite) #3694 (andygrove)

  • fix: Correct GetArrayItem null handling for dynamic indices and re-enable native execution #3709 (0lai0)

  • fix: enable native_datafusion Spark SQL tests for #3320, #3401, #3719 #3718 (andygrove)

  • fix: Native engine crashes on literal DateTrunc and TimestampTrunc #3668 (0lai0)

  • fix: Use the loaded Comet extension too (Spark 3.5.8) #3707 (martin-g)

  • fix: Use thread context classloader for Iceberg class loading #3738 (karuppayya)

  • fix: disable ANSI mode in benchmarks to avoid exceptions on invalid input #3750 (parthchandra)

  • fix: fix string to timestamp cast for UTC timestamps #3656 (parthchandra)

  • fix: native error message not propagated to SparkException on empty errorClass #3727 (manuzhang)

  • fix: add timezone and special formats support for cast string to timestamp #3730 (parthchandra)

  • fix: handle inf/-inf/nan in ShimSparkErrorConverter cast overflow #3768 (manuzhang)

  • fix: handle scalar decimal value overflow correctly in ANSI mode #3803 (parthchandra)

  • fix: correct array_append return type and mark as Compatible #3795 (andygrove)

  • fix: remove broken directBuffer feature for parquet reads #3814 (andygrove)

  • fix: remove unnecessary IgnoreCometNativeDataFusion tags from 3.5.8 diff #3831 (andygrove)

  • fix: query tolerance= in SQL file tests now also asserts Comet native execution #3797 (andygrove)

  • fix: include scan impl in PR Linux artifact names #3853 (manuzhang)

  • fix: correct invalid Option.contains assertion in cast test #3851 (manuzhang)

  • fix: native_datafusion: case-insensitive mode doesn’t detect duplicate/ambiguous Parquet fields #3808 (vaibhawvipul)

  • fix: cache object stores and bucket regions to reduce DNS query volume #3802 (andygrove)

  • fix: skip Comet columnar shuffle for stages with DPP scans #3879 (andygrove)

  • fix: Native_datafusion reports correct files and bytes scanned #3798 (0lai0)

  • fix: address clippy collapsible_match warnings #3863 (manuzhang)

  • fix: parameterize file count in Native_datafusion metrics test #3896 (0lai0)

  • fix: Make cast string to timestamp compatible with Spark #3884 (parthchandra)

  • fix: add EmptySchemaShufflePartitioner and test from #3858 #3893 (mbutrovich)

  • fix: use min instead of max when capping write buffer size to Int range #3914 (andygrove)

  • fix: Update TPC-DS q36a golden file for Spark 4.0 decimal UNION widening change #3915 (parthchandra)

  • fix: audit array_insert expression for correctness and test coverage #3890 (andygrove)

  • fix: handle ambiguous and non-existent local times #3865 (matthewalex4)

  • fix: improve tracing feature #3688 (andygrove)

  • fix: make tan and atan2 compatible #3849 (kazuyukitanimura)

  • fix: checkSparkAnswer displays incorrect labels #3927 (parthchandra)

  • fix: support full-width and null characters, and negative scale in string to decimal #3922 (parthchandra)

  • fix: enable Corr #3892 (kazuyukitanimura)

  • fix: array to array cast #2897 (manuzhang)

  • fix: exclude tpcds-plan-stability extended.txt files from rat license check #3964 (andygrove)

  • fix: use UTC for Arrow schema timezone in SparkToColumnar conversions #3878 (andygrove)

  • fix: remove spurious .flatten call that garbled SortMergeJoin fallback messages #3968 (andygrove)

  • fix: Add legacy mode handling to cast Decimal to String #3939 (parthchandra)

  • fix: improve test coverage for decimal to primitive type casts #3948 (parthchandra)

  • fix: fix decimal div and add tests #3952 (parthchandra)

  • fix: make shuffle fallback decisions sticky across planning passes #3982 (andygrove)

Performance related:

  • perf: Coalesce broadcast exchange batches before broadcasting #3703 (mbutrovich)

  • perf: stop using FFI in native shuffle read path #3731 (andygrove)

  • perf: Enable native c2r for more queries #3764 (andygrove)

  • perf: Mark more operators as FFI safe to avoid deep copies #3765 (andygrove)

  • perf: remove BufReader wrapper when copying spill files to shuffle output #3861 (andygrove)

  • fix: share unified memory pools across native execution contexts within a task #3924 (andygrove)

Implemented enhancements:

  • feat: Add PR review skill for Comet expression reviews #3711 (andygrove)

  • feat: add sort_array benchmark #3758 (grorge123)

  • feat: Support Spark expression days #3746 (0lai0)

  • feat: expose comet metrics through Sparks external monitoring system #3708 (coderfender)

  • feat: support SQL aggregate FILTER (WHERE …) clause in native execution #3835 (viirya)

  • feat: Implement CRC32C algorithm #3822 (snmvaughan)

  • feat: add audit-comet-expression Claude Code skill #3793 (andygrove)

  • feat: enable native_datafusion scan in auto mode #3781 (andygrove)

  • feat: support LEAD and LAG window functions with IGNORE NULLS #3876 (viirya)

  • feat: add standalone shuffle benchmark tool #3752 (andygrove)

  • feat: Mark array_compact as Compatible and improve test coverage #3889 (andygrove)

  • feat: add native support for get_json_object expression #3747 (andygrove)

  • feat: Support Spark expression hours #3804 (0lai0)

  • feat: add support for date_from_unix_date expression #3144 (andygrove)

  • feat: support spark bin function #3928 (kazantsev-maksim)

  • feat: support sort_array expression #3706 (grorge123)

Documentation updates:

  • docs: Add some .lldbint configurations for debugging document #3686 (wForget)

  • docs: document Iceberg Spark tests in contributor guide #3777 (mbutrovich)

  • docs: document negative zero cast-to-string incompatibility #3811 (andygrove)

  • docs: Add docs about global singletons to development guide #3809 (mbutrovich)

  • docs: add bug triage guide for prioritizing open issues #3812 (andygrove)

  • docs: broaden area:writer and area:scan label descriptions #3843 (andygrove)

  • docs: expand profiling guide with JVM and async-profiler coverage #3628 (andygrove)

  • doc: GetArrayItem is now supported #3880 (kazuyukitanimura)

  • docs: update Iceberg docs to reflect capabilities #3961 (mbutrovich)

  • docs: clarify Maven staging behavior across release candidates #3963 (andygrove)

  • docs: document CI test suite registration requirement #3943 (andygrove)

  • docs: Add documentation for running spark-sql-perf #3950 (andygrove)

Other:

  • ci: remove Java Iceberg integration tests from CI [iceberg] #3673 (andygrove)

  • build: revert “chore(deps): bump runs-on/action from 2.0.3 to 2.1.0” #3714 (blaginin)

  • chore(deps): bump lz4_flex from 0.12.0 to 0.12.1 in /native #3713 (dependabot[bot])

  • chore: Add changelog for 0.14.0 release #3681 (andygrove)

  • chore: bump version to 0.15.0-SNAPSHOT #3715 (andygrove)

  • chore: update documentation links for 0.14.0 release #3716 (andygrove)

  • Fix: map_from_arrays() with NULL inputs causes native crash #3356 (kazantsev-maksim)

  • chore: Refactor planner random and partition expressions #3704 (coderfender)

  • test: enable ignored 4.0 tests, enable ansi mode #3454 (parthchandra)

  • chore: keep original error message for failed SQL test #3725 (comphead)

  • build: lint as a separate step #3717 (blaginin)

  • chore(deps): bump lz4_flex from 0.12.1 to 0.13.0 in /native #3744 (dependabot[bot])

  • chore(deps): bump runs-on/action from 2.0.3 to 2.1.0 #3741 (dependabot[bot])

  • chore: remove iceberg-java integration #3739 (andygrove)

  • chore: refactor to extract common and jni-bridge as separate crates #3667 (andygrove)

  • chore(deps): bump rustls-webpki from 0.103.9 to 0.103.10 in /native #3751 (dependabot[bot])

  • chore(deps): bump github/codeql-action from 4.32.6 to 4.33.0 #3742 (dependabot[bot])

  • chore(deps): bump cc from 1.2.56 to 1.2.57 in /native in the all-other-cargo-deps group #3743 (dependabot[bot])

  • chore: extract shuffle module into separate crate #3749 (andygrove)

  • chore: run Spark 3.4 tests with native_datafusion scan #3722 (andygrove)

  • chore: [native_datafusion] replace #3311 references with specific issues in 3.5.8 diff #3761 (andygrove)

  • chore: fix allocations in schema adapter for native_datafusion scan #3755 (comphead)

  • chore: update Iceberg Java diffs after #3739 [iceberg] #3774 (mbutrovich)

  • chore(deps): update datafusion to 52.4.0 [iceberg] #3769 (andygrove)

  • test: Port DateTimeUtilsSuite timestamp format tests in Comet #3780 (parthchandra)

  • build: add CometDateTimeUtilsSuite to CI workflow #3782 (andygrove)

  • chore: Run Spark 4.0 SQL tests with native_datafusion scan #3728 (andygrove)

  • Test: Add test coverage and documentation for SumDecimal/AvgDecimal nullability behavior #3766 (vaibhawvipul)

  • tests: fix Iceberg test diffs for Spark 3.4 [iceberg] #3785 (mbutrovich)

  • ci: run Iceberg Spark tests on all PRs and commits to main branch #3792 (mbutrovich)

  • chore(deps): bump github/codeql-action from 4.33.0 to 4.34.1 #3805 (dependabot[bot])

  • chore(deps): bump the all-other-cargo-deps group in /native with 3 updates #3806 (dependabot[bot])

  • refactor: reorganize shuffle crate module structure #3772 (andygrove)

  • chore: update git plugin to allow worktrees #3815 (parthchandra)

  • chore: Remove SupportsComet interface #3818 (andygrove)

  • Replace catalyst.util.fileToString with Files.readString #3844 (snmvaughan)

  • test: cast negative zero to string #3829 (kazuyukitanimura)

  • test: add SQL file test for casting double to string #3854 (andygrove)

  • chore(deps): bump jni from 0.21.1 to 0.22.4 in /native #3753 (manuzhang)

  • test: ceil and floor works correctly for Decimal128 #3848 (kazuyukitanimura)

  • chore(deps): bump the all-other-cargo-deps group in /native with 2 updates #3899 (dependabot[bot])

  • chore(deps): bump github/codeql-action from 4.34.1 to 4.35.1 #3898 (dependabot[bot])

  • chore(deps): bump actions/github-script from 7 to 8 #3897 (dependabot[bot])

  • chore: add SQL tests for FIRST/LAST aggregates #3891 (comphead)

  • test: do not ignore test SPARK-48037 #2774 (kazuyukitanimura)

  • deps: upgrade to DataFusion 53.0, Arrow to 58.1 #3629 (mbutrovich)

  • chore: native_datafusion to report scan task input metrics #3842 (comphead)

  • test: improve array_distinct test coverage and incompatibility description #3887 (andygrove)

  • ci: remove native_datafusion CI workflows after 4f5eaf0 #3908 (mbutrovich)

  • test: Enable more Spark tests #3905 (kazuyukitanimura)

  • chore: reenable width_bucket test #3910 (comphead)

  • chore: reenable array_contains tests #3912 (comphead)

  • chore: reenable array_remove tests #3917 (comphead)

  • chore(deps): bump actions/github-script from 8 to 9 #3923 (dependabot[bot])

  • chore: Remove redundant parquet.enable.dictionary ConfigMatrix from SQL tests #3866 (andygrove)

  • chore(deps): bump rand from 0.10.0 to 0.10.1 in /native #3942 (dependabot[bot])

  • chore: add changelog for 0.14.1 #3944 (andygrove)

  • deps: bump iceberg-rust to latest after picking up fixes for #3856 and #3860 #3958 (mbutrovich)

  • deps: update to opendal revision where object_store get_ranges performance fixed #3965 (comphead)

  • chore: Register task completion listener to ensure CometExecIterator is always closed #3959 (wForget)

  • chore(deps): bump github/codeql-action from 4.35.1 to 4.35.2 #3972 (dependabot[bot])

  • deps: upgrade to DataFusion 53.1 #3946 (mbutrovich)

Credits#

Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor.

    55	Andy Grove
    14	Parth Chandra
    14	dependabot[bot]
    12	Matt Butrovich
     8	Oleks V
     7	KAZUYUKI TANIMURA
     7	Manu Zhang
     6	ChenChen Lai
     2	Bhargava Vadlamani
     2	Dmitrii Blaginin
     2	Han-Wen Tsao
     2	Kazantsev Maksim
     2	Liang-Chi Hsieh
     2	Steve Vaughan
     2	Vipul Vaibhaw
     2	Zhen Wang
     1	Karuppayya
     1	Martin Grigorov
     1	Matthew Alex

Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.