DataFusion Comet 0.15.0 Changelog#
This release consists of 142 commits from 19 contributors. See credits at the end of this changelog for more information.
Fixed bugs:
fix: enable native_datafusion Spark SQL tests previously ignored in #3315 #3696 (andygrove)
fix: route file-not-found errors through SparkError JSON path #3699 (andygrove)
fix: fall back from native_datafusion for duplicate fields in case-insensitive mode #3687 (andygrove)
fix: enable more Spark SQL tests for
native_datafusion(DynamicPartitionPruningSuite/ExplainSuite) #3694 (andygrove)fix: Correct GetArrayItem null handling for dynamic indices and re-enable native execution #3709 (0lai0)
fix: enable native_datafusion Spark SQL tests for #3320, #3401, #3719 #3718 (andygrove)
fix: Native engine crashes on literal DateTrunc and TimestampTrunc #3668 (0lai0)
fix: Use the loaded Comet extension too (Spark 3.5.8) #3707 (martin-g)
fix: Use thread context classloader for Iceberg class loading #3738 (karuppayya)
fix: disable ANSI mode in benchmarks to avoid exceptions on invalid input #3750 (parthchandra)
fix: fix string to timestamp cast for UTC timestamps #3656 (parthchandra)
fix: native error message not propagated to SparkException on empty errorClass #3727 (manuzhang)
fix: add timezone and special formats support for cast string to timestamp #3730 (parthchandra)
fix: handle inf/-inf/nan in ShimSparkErrorConverter cast overflow #3768 (manuzhang)
fix: handle scalar decimal value overflow correctly in ANSI mode #3803 (parthchandra)
fix: correct array_append return type and mark as Compatible #3795 (andygrove)
fix: remove broken directBuffer feature for parquet reads #3814 (andygrove)
fix: remove unnecessary IgnoreCometNativeDataFusion tags from 3.5.8 diff #3831 (andygrove)
fix: query tolerance= in SQL file tests now also asserts Comet native execution #3797 (andygrove)
fix: include scan impl in PR Linux artifact names #3853 (manuzhang)
fix: correct invalid Option.contains assertion in cast test #3851 (manuzhang)
fix: native_datafusion: case-insensitive mode doesn’t detect duplicate/ambiguous Parquet fields #3808 (vaibhawvipul)
fix: cache object stores and bucket regions to reduce DNS query volume #3802 (andygrove)
fix: skip Comet columnar shuffle for stages with DPP scans #3879 (andygrove)
fix: Native_datafusion reports correct files and bytes scanned #3798 (0lai0)
fix: address clippy collapsible_match warnings #3863 (manuzhang)
fix: parameterize file count in Native_datafusion metrics test #3896 (0lai0)
fix: Make cast string to timestamp compatible with Spark #3884 (parthchandra)
fix: add EmptySchemaShufflePartitioner and test from #3858 #3893 (mbutrovich)
fix: use min instead of max when capping write buffer size to Int range #3914 (andygrove)
fix: Update TPC-DS q36a golden file for Spark 4.0 decimal UNION widening change #3915 (parthchandra)
fix: audit array_insert expression for correctness and test coverage #3890 (andygrove)
fix: handle ambiguous and non-existent local times #3865 (matthewalex4)
fix: improve tracing feature #3688 (andygrove)
fix: make tan and atan2 compatible #3849 (kazuyukitanimura)
fix: checkSparkAnswer displays incorrect labels #3927 (parthchandra)
fix: support full-width and null characters, and negative scale in string to decimal #3922 (parthchandra)
fix: enable Corr #3892 (kazuyukitanimura)
fix: array to array cast #2897 (manuzhang)
fix: exclude tpcds-plan-stability extended.txt files from rat license check #3964 (andygrove)
fix: use UTC for Arrow schema timezone in SparkToColumnar conversions #3878 (andygrove)
fix: remove spurious .flatten call that garbled SortMergeJoin fallback messages #3968 (andygrove)
fix: Add legacy mode handling to cast Decimal to String #3939 (parthchandra)
fix: improve test coverage for decimal to primitive type casts #3948 (parthchandra)
fix: fix decimal div and add tests #3952 (parthchandra)
fix: make shuffle fallback decisions sticky across planning passes #3982 (andygrove)
Performance related:
perf: Coalesce broadcast exchange batches before broadcasting #3703 (mbutrovich)
perf: stop using FFI in native shuffle read path #3731 (andygrove)
perf: Enable native c2r for more queries #3764 (andygrove)
perf: Mark more operators as FFI safe to avoid deep copies #3765 (andygrove)
perf: remove BufReader wrapper when copying spill files to shuffle output #3861 (andygrove)
fix: share unified memory pools across native execution contexts within a task #3924 (andygrove)
Implemented enhancements:
feat: Add PR review skill for Comet expression reviews #3711 (andygrove)
feat: add sort_array benchmark #3758 (grorge123)
feat: Support Spark expression days #3746 (0lai0)
feat: expose comet metrics through Sparks external monitoring system #3708 (coderfender)
feat: support SQL aggregate FILTER (WHERE …) clause in native execution #3835 (viirya)
feat: Implement CRC32C algorithm #3822 (snmvaughan)
feat: add audit-comet-expression Claude Code skill #3793 (andygrove)
feat: enable native_datafusion scan in auto mode #3781 (andygrove)
feat: support LEAD and LAG window functions with IGNORE NULLS #3876 (viirya)
feat: add standalone shuffle benchmark tool #3752 (andygrove)
feat: Mark array_compact as Compatible and improve test coverage #3889 (andygrove)
feat: add native support for get_json_object expression #3747 (andygrove)
feat: Support Spark expression hours #3804 (0lai0)
feat: add support for date_from_unix_date expression #3144 (andygrove)
feat: support spark bin function #3928 (kazantsev-maksim)
feat: support sort_array expression #3706 (grorge123)
Documentation updates:
docs: Add some .lldbint configurations for debugging document #3686 (wForget)
docs: document Iceberg Spark tests in contributor guide #3777 (mbutrovich)
docs: document negative zero cast-to-string incompatibility #3811 (andygrove)
docs: Add docs about global singletons to development guide #3809 (mbutrovich)
docs: add bug triage guide for prioritizing open issues #3812 (andygrove)
docs: broaden area:writer and area:scan label descriptions #3843 (andygrove)
docs: expand profiling guide with JVM and async-profiler coverage #3628 (andygrove)
doc: GetArrayItem is now supported #3880 (kazuyukitanimura)
docs: update Iceberg docs to reflect capabilities #3961 (mbutrovich)
docs: clarify Maven staging behavior across release candidates #3963 (andygrove)
docs: document CI test suite registration requirement #3943 (andygrove)
docs: Add documentation for running spark-sql-perf #3950 (andygrove)
Other:
ci: remove Java Iceberg integration tests from CI [iceberg] #3673 (andygrove)
build: revert “chore(deps): bump runs-on/action from 2.0.3 to 2.1.0” #3714 (blaginin)
chore(deps): bump lz4_flex from 0.12.0 to 0.12.1 in /native #3713 (dependabot[bot])
chore: Add changelog for 0.14.0 release #3681 (andygrove)
chore: bump version to 0.15.0-SNAPSHOT #3715 (andygrove)
chore: update documentation links for 0.14.0 release #3716 (andygrove)
Fix: map_from_arrays() with NULL inputs causes native crash #3356 (kazantsev-maksim)
chore: Refactor planner random and partition expressions #3704 (coderfender)
test: enable ignored 4.0 tests, enable ansi mode #3454 (parthchandra)
chore: keep original error message for failed SQL test #3725 (comphead)
build: lint as a separate step #3717 (blaginin)
chore(deps): bump lz4_flex from 0.12.1 to 0.13.0 in /native #3744 (dependabot[bot])
chore(deps): bump runs-on/action from 2.0.3 to 2.1.0 #3741 (dependabot[bot])
chore: remove iceberg-java integration #3739 (andygrove)
chore: refactor to extract
commonandjni-bridgeas separate crates #3667 (andygrove)chore(deps): bump rustls-webpki from 0.103.9 to 0.103.10 in /native #3751 (dependabot[bot])
chore(deps): bump github/codeql-action from 4.32.6 to 4.33.0 #3742 (dependabot[bot])
chore(deps): bump cc from 1.2.56 to 1.2.57 in /native in the all-other-cargo-deps group #3743 (dependabot[bot])
chore: extract shuffle module into separate crate #3749 (andygrove)
chore: run Spark 3.4 tests with
native_datafusionscan #3722 (andygrove)chore: [native_datafusion] replace #3311 references with specific issues in 3.5.8 diff #3761 (andygrove)
chore: fix allocations in schema adapter for
native_datafusionscan #3755 (comphead)chore: update Iceberg Java diffs after #3739 [iceberg] #3774 (mbutrovich)
chore(deps): update datafusion to 52.4.0 [iceberg] #3769 (andygrove)
test: Port DateTimeUtilsSuite timestamp format tests in Comet #3780 (parthchandra)
build: add
CometDateTimeUtilsSuiteto CI workflow #3782 (andygrove)chore: Run Spark 4.0 SQL tests with native_datafusion scan #3728 (andygrove)
Test: Add test coverage and documentation for SumDecimal/AvgDecimal nullability behavior #3766 (vaibhawvipul)
tests: fix Iceberg test diffs for Spark 3.4 [iceberg] #3785 (mbutrovich)
ci: run Iceberg Spark tests on all PRs and commits to main branch #3792 (mbutrovich)
chore(deps): bump github/codeql-action from 4.33.0 to 4.34.1 #3805 (dependabot[bot])
chore(deps): bump the all-other-cargo-deps group in /native with 3 updates #3806 (dependabot[bot])
refactor: reorganize shuffle crate module structure #3772 (andygrove)
chore: update git plugin to allow worktrees #3815 (parthchandra)
chore: Remove
SupportsCometinterface #3818 (andygrove)Replace catalyst.util.fileToString with Files.readString #3844 (snmvaughan)
test: cast negative zero to string #3829 (kazuyukitanimura)
test: add SQL file test for casting double to string #3854 (andygrove)
chore(deps): bump jni from 0.21.1 to 0.22.4 in /native #3753 (manuzhang)
test: ceil and floor works correctly for Decimal128 #3848 (kazuyukitanimura)
chore(deps): bump the all-other-cargo-deps group in /native with 2 updates #3899 (dependabot[bot])
chore(deps): bump github/codeql-action from 4.34.1 to 4.35.1 #3898 (dependabot[bot])
chore(deps): bump actions/github-script from 7 to 8 #3897 (dependabot[bot])
chore: add SQL tests for FIRST/LAST aggregates #3891 (comphead)
test: do not ignore test SPARK-48037 #2774 (kazuyukitanimura)
deps: upgrade to DataFusion 53.0, Arrow to 58.1 #3629 (mbutrovich)
chore:
native_datafusionto report scan task input metrics #3842 (comphead)test: improve array_distinct test coverage and incompatibility description #3887 (andygrove)
ci: remove native_datafusion CI workflows after 4f5eaf0 #3908 (mbutrovich)
test: Enable more Spark tests #3905 (kazuyukitanimura)
chore: reenable
width_buckettest #3910 (comphead)chore: reenable
array_containstests #3912 (comphead)chore: reenable
array_removetests #3917 (comphead)chore(deps): bump actions/github-script from 8 to 9 #3923 (dependabot[bot])
chore: Remove redundant
parquet.enable.dictionaryConfigMatrix from SQL tests #3866 (andygrove)chore(deps): bump rand from 0.10.0 to 0.10.1 in /native #3942 (dependabot[bot])
chore: add changelog for 0.14.1 #3944 (andygrove)
deps: bump iceberg-rust to latest after picking up fixes for #3856 and #3860 #3958 (mbutrovich)
deps: update to
opendalrevision where object_storeget_rangesperformance fixed #3965 (comphead)chore: Register task completion listener to ensure CometExecIterator is always closed #3959 (wForget)
chore(deps): bump github/codeql-action from 4.35.1 to 4.35.2 #3972 (dependabot[bot])
deps: upgrade to DataFusion 53.1 #3946 (mbutrovich)
Credits#
Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor.
55 Andy Grove
14 Parth Chandra
14 dependabot[bot]
12 Matt Butrovich
8 Oleks V
7 KAZUYUKI TANIMURA
7 Manu Zhang
6 ChenChen Lai
2 Bhargava Vadlamani
2 Dmitrii Blaginin
2 Han-Wen Tsao
2 Kazantsev Maksim
2 Liang-Chi Hsieh
2 Steve Vaughan
2 Vipul Vaibhaw
2 Zhen Wang
1 Karuppayya
1 Martin Grigorov
1 Matthew Alex
Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.