DataFusion Comet 0.14.0 Changelog#
This release consists of 189 commits from 21 contributors. See credits at the end of this changelog for more information.
Fixed bugs:
fix: [iceberg] Fall back on dynamicpruning expressions for CometIcebergNativeScan #3335 (mbutrovich)
fix: [iceberg] Disable native c2r by default #3348 (andygrove)
fix: Fix
space()with negative input #3347 (hsiang-c)fix: respect scan impl config for v2 scan #3357 (andygrove)
fix: fix memory safety issue in native c2r #3367 (andygrove)
fix: preserve partitioning in CometNativeScanExec for bucketed scans #3392 (andygrove)
fix: unignore row index Spark SQL tests for native_datafusion #3414 (andygrove)
fix: fall back to Spark when Parquet field ID matching is enabled in native_datafusion #3415 (andygrove)
fix: Expose bucketing information from CometNativeScanExec #3437 (andygrove)
fix: support scalar processing for
spacefunction #3408 (kazantsev-maksim)fix: Revert “perf: Remove mutable buffers from scan partition/missing columns (#3411)” [iceberg] #3486 (mbutrovich)
fix: unignore input_file_name Spark SQL tests for native_datafusion #3458 (andygrove)
fix: add scalar support for bit_count expression #3361 (hsiang-c)
fix: Support concat_ws with literal NULL separator #3542 (0lai0)
fix: handle type mismatches in native c2r conversion #3583 (andygrove)
fix: disable native C2R for legacy Iceberg scans [iceberg] #3663 (mbutrovich)
fix: resolve Miri UB in null struct field test, re-enable Miri on PRs #3669 (andygrove)
fix: Support on all-literal RLIKE expression #3647 (0lai0)
fix: Fix scan metrics test to run with both native_datafusion and native_iceberg_compat #3690 (andygrove)
Performance related:
perf: refactor sum int with specialized implementations for each eval_mode #3054 (andygrove)
perf: Optimize contains expression with SIMD-based scalar pattern sea… #2991 (Shekharrajak)
perf: Add batch coalescing in BufBatchWriter to reduce IPC schema overhead #3441 (andygrove)
perf: Use
native_datafusionscan in benchmark scripts (6% faster for TPC-H) #3460 (andygrove)perf: Remove mutable buffers from scan partition/missing columns #3411 (andygrove)
perf: [iceberg] Single-pass FileScanTask validation #3443 (mbutrovich)
perf: Improve benchmarks for native row-to-columnar used by JVM shuffle #3290 (andygrove)
perf: executePlan uses a channel to park executor task thread instead of yield_now() [iceberg] #3553 (mbutrovich)
perf: Initialize tokio runtime worker threads from spark.executor.cores #3555 (andygrove)
perf: Add Comet config for native Iceberg reader’s data file concurrency [iceberg] #3584 (mbutrovich)
perf: reuse CometConf.COMET_TRACING_ENABLED, Native, NativeUtil in NativeBatchDecoderIterator #3627 (mbutrovich)
perf: Improve performance of native row-to-columnar transition used by JVM shuffle #3289 (andygrove)
perf: use aligned pointer reads for SparkUnsafeRow field accessors #3670 (andygrove)
perf: Optimize some decimal expressions #3619 (andygrove)
Implemented enhancements:
feat: Native columnar to row conversion (Phase 2) #3266 (andygrove)
feat: Enable native columnar-to-row by default #3299 (andygrove)
feat: add support for
width_bucketexpression #3273 (davidlghellin)feat: Drop
native_cometas a valid option forCOMET_NATIVE_SCAN_IMPLconfig #3358 (andygrove)feat: Support date to timestamp cast #3383 (coderfender)
feat: CometExecRDD supports per-partition plan data, reduce Iceberg native scan serialization, add DPP [iceberg] #3349 (mbutrovich)
feat: Support right expression #3207 (Shekharrajak)
feat: support map_contains_key expression #3369 (peterxcli)
feat: add support for make_date expression #3147 (andygrove)
feat: add support for next_day expression #3148 (andygrove)
feat: implement cast from whole numbers to binary format and bool to decimal #3083 (coderfender)
feat: Support for StringSplit #2772 (Shekharrajak)
feat: CometNativeScan per-partition plan serde #3511 (mbutrovich)
feat: Remove mutable buffers from scan partition/missing columns [iceberg] #3514 (andygrove)
feat: pass spark.comet.datafusion.* configs through to DataFusion session #3455 (andygrove)
feat: pass vended credentials to Iceberg native scan #3523 (tokoko)
feat: Cast date to Numeric (No Op) #3544 (coderfender)
feat: add support
crc32expression #3498 (rafafrdz)feat: Support int to timestamp casts #3541 (coderfender)
feat(benchmarks): add async-profiler support to TPC benchmark scripts #3613 (andygrove)
feat: Cast numeric (non int) to timestamp #3559 (coderfender)
feat: [ANSI] Ansi sql error messages #3580 (parthchandra)
feat: enable debug assertions in CI profile, fix unaligned memory access bug #3652 (andygrove)
feat: Enable native c2r by default, add debug asserts #3649 (andygrove)
feat: support Spark luhn_check expression #3573 (n0r0shi)
Documentation updates:
docs: Add changelog for 0.13.0 #3260 (andygrove)
docs: fix bug in placement of prettier-ignore-end in generated docs #3287 (andygrove)
docs: Add contributor guide page for SQL file tests #3333 (andygrove)
docs: fix inaccurate claim about mutable buffers in parquet scan docs #3378 (andygrove)
docs: Improve documentation on maven usage for running tests #3370 (andygrove)
docs: move release process docs to contributor guide #3492 (andygrove)
docs: improve release process documentation #3508 (andygrove)
docs: update roadmap #3543 (mbutrovich)
docs: Update Parquet scan documentation #3433 (andygrove)
docs: recommend SQL file tests for new expressions #3598 (andygrove)
docs: add SAFETY comments to all unsafe blocks in shuffle spark_unsafe module #3603 (andygrove)
docs: Fix link to overview page #3625 (manuzhang)
doc: Document sql query error propagation #3651 (parthchandra)
docs: update Iceberg docs in advance of 0.14.0 #3691 (mbutrovich)
Other:
chore(deps): bump actions/download-artifact from 4 to 7 #3281 (dependabot[bot])
chore(deps): bump cc from 1.2.53 to 1.2.54 in /native #3284 (dependabot[bot])
build: Fix docs workflow dependency resolution failure #3275 (andygrove)
chore(deps): bump actions/upload-artifact from 4 to 6 #3280 (dependabot[bot])
chore(deps): bump actions/cache from 4 to 5 #3279 (dependabot[bot])
chore(deps): bump uuid from 1.19.0 to 1.20.0 in /native #3282 (dependabot[bot])
build: reduce overhead of fuzz testing #3257 (andygrove)
chore: Start 0.14.0 development #3288 (andygrove)
chore: Add Comet released artifacts and links to maven #3291 (comphead)
chore: Add take/untake workflow for issue self-assignment #3270 (andygrove)
ci: Consolidate Spark SQL test jobs to reduce CI time #3271 (andygrove)
chore(deps): bump org.assertj:assertj-core from 3.23.1 to 3.27.7 #3293 (dependabot[bot])
chore: Add microbenchmark for IcebergScan operator serde roundtrip #3296 (andygrove)
chore: Remove IgnoreCometNativeScan from ParquetEncryptionSuite in 3.5.7 diff #3304 (andygrove)
chore: Enable native c2r in plan stability suite #3302 (andygrove)
chore: Add support for Spark 3.5.8 #3323 (manuzhang)
chore: Invert usingDataSourceExec test helper to usingLegacyNativeCometScan #3310 (andygrove)
tests: Add SQL test files covering edge cases for (almost) every Comet-supported expression #3328 (andygrove)
chore: Adapt caching from #3251 to [iceberg] workflows #3353 (mbutrovich)
bug: Fix string decimal type throw right exception #3248 (coderfender)
chore: Migrate
concattests to sql based testing framework #3352 (andygrove)chore(deps): bump actions/setup-java from 4 to 5 #3363 (dependabot[bot])
chore: Annotate classes/methods/fields that are used by Apache Iceberg #3237 (andygrove)
Feat: map_from_entries #2905 (kazantsev-maksim)
chore: Move spark unsafe classes into spark_unsafe #3373 (EmilyMatt)
chore: Extract some tied down logic #3374 (EmilyMatt)
Fix: array contains null handling #3372 (Shekharrajak)
chore: stop uploading code coverage results #3381 (andygrove)
chore: update target-cpus in published binaries to x86-64-v3 and neoverse-n1 #3368 (mbutrovich)
chore: show line of error sql #3390 (peterxcli)
chore: Move writer-related logic to “writers” module #3385 (EmilyMatt)
chore(deps): bump bytes from 1.11.0 to 1.11.1 in /native #3380 (dependabot[bot])
chore: Clean up and split shuffle module #3395 (EmilyMatt)
chore: Make PR workflows match target-cpu flags in published jars #3402 (mbutrovich)
chore(deps): bump time from 0.3.45 to 0.3.47 in /native #3412 (dependabot[bot])
chore: Run Spark SQL tests with
native_datafusionin CI #3393 (andygrove)test: Add ANSI mode SQL test files for expressions that throw on invalid input #3377 (andygrove)
refactor: Split read benchmarks and add addParquetScanCases helper #3407 (andygrove)
chore: 4.5x reduction in number of golden files #3399 (andygrove)
Feat: to_csv #3004 (kazantsev-maksim)
minor: map_from_entries sql tests #3394 (kazantsev-maksim)
chore: add confirmation before tarball is released #3439 (milenkovicm)
chore(deps): bump cc from 1.2.54 to 1.2.55 in /native #3451 (dependabot[bot])
chore: Add Iceberg TPC-H benchmarking scripts #3294 (andygrove)
chore: Remove dead code paths for deprecated native_comet scan #3396 (andygrove)
chore(deps): bump arrow from 57.2.0 to 57.3.0 in /native #3449 (dependabot[bot])
chore(deps): bump aws-config from 1.8.12 to 1.8.13 in /native #3450 (dependabot[bot])
chore(deps): bump regex from 1.12.2 to 1.12.3 in /native #3453 (dependabot[bot])
chore(deps): bump rand from 0.9.2 to 0.10.0 in /native #3465 (manuzhang)
test: Add additional contains expression tests #3462 (andygrove)
chore: Adjust native artifact caching key in CI #3476 (mbutrovich)
chore: Add Comet writer nested types test assertion #3480 (comphead)
test: Add SQL file tests for left and right expressions #3463 (andygrove)
chore: Add GitHub workflow to close stale PRs #3488 (andygrove)
chore: Make
pushCI to be triggered formainbranch only #3474 (comphead)ci: disable Miri safety checks until compatibility is restored #3504 (andygrove)
chore: Add memory reservation debug logging #3489 (andygrove)
chore: enable GitHub button for updating PR branches with latest from main #3505 (andygrove)
chore: remove some dead cast code #3513 (andygrove)
chore(deps): bump aws-credential-types from 1.2.11 to 1.2.12 in /native #3525 (dependabot[bot])
chore(deps): bump libc from 0.2.180 to 0.2.182 in /native #3527 (dependabot[bot])
chore(deps): bump cc from 1.2.55 to 1.2.56 in /native #3528 (dependabot[bot])
chore(deps): bump tempfile from 3.24.0 to 3.25.0 in /native #3529 (dependabot[bot])
ci: Bump up
actions/upload-artifactfrom v4 to v6 #3533 (manuzhang)chore(deps): bump aws-config from 1.8.13 to 1.8.14 in /native #3526 (dependabot[bot])
chore: refactor array_repeat #3516 (kazantsev-maksim)
chore: Add envvars to override writer configs and cometConf minor clean up #3540 (comphead)
chore: Cast module refactor boolean module #3491 (coderfender)
chore: Consolidate TPC benchmark scripts #3538 (andygrove)
chore(deps): bump parquet from 57.2.0 to 57.3.0 in /native #3568 (dependabot[bot])
chore(deps): bump uuid from 1.20.0 to 1.21.0 in /native #3567 (dependabot[bot])
chore: Add TPC-* queries to repo #3562 (andygrove)
chore(deps): bump assertables from 9.8.4 to 9.8.6 in /native #3570 (dependabot[bot])
chore(deps): bump actions/stale from 10.1.1 to 10.2.0 #3565 (dependabot[bot])
chore(deps): bump aws-credential-types from 1.2.12 to 1.2.13 in /native #3566 (dependabot[bot])
chore: makes dependabot to group deps into single PR #3578 (comphead)
chore: Cast module refactor : String #3577 (coderfender)
chore(deps): bump the all-other-cargo-deps group in /native with 3 updates #3581 (dependabot[bot])
chore: Add Docker Compose support for TPC benchmarks #3576 (andygrove)
build: Runs-on for
PR Build (Linux)#3579 (blaginin)chore: Add consistency checks and result hashing to TPC benchmarks #3582 (andygrove)
chore: Remove all remaining uses of legacy BatchReader from Comet [iceberg] #3468 (andygrove)
build: Skip CI workflows for changes in benchmarks directory #3599 (andygrove)
build: fix runs-on tags for consistency #3601 (andygrove)
chore: Add Java Flight Recorder profiling to TPC benchmarks #3597 (andygrove)
deps: DataFusion 52.0.0 migration (SchemaAdapter changes, etc.) [iceberg] #3536 (comphead)
chore(deps): bump actions/download-artifact from 7 to 8 #3609 (dependabot[bot])
chore(deps): bump actions/upload-artifact from 6 to 7 #3610 (dependabot[bot])
chore: bump iceberg-rust dependency to latest [iceberg] #3606 (mbutrovich)
CI: Add CodeQL workflow for GitHub Actions security scanning #3617 (kevinjqliu)
CI: update codeql with pinned action versions #3621 (kevinjqliu)
chore: replace legacy datetime rebase tests with current scan coverage [iceberg] #3605 (andygrove)
build: More runners #3626 (blaginin)
deps: bump DataFusion to 52.2 [iceberg] #3622 (mbutrovich)
chore: use datafusion impl of
spacefunction #3612 (kazantsev-maksim)chore: use datafusion impl of
bit_countfunction #3616 (kazantsev-maksim)chore: refactor cast module numeric data types #3623 (coderfender)
chore: Refactor cast module temporal types #3624 (coderfender)
chore: Fix clippy complaints #3634 (comphead)
chore(deps): bump docker/build-push-action from 6 to 7 #3639 (dependabot[bot])
chore(deps): bump github/codeql-action from 4.32.5 to 4.32.6 #3637 (dependabot[bot])
chore(deps): bump docker/setup-buildx-action from 3 to 4 #3636 (dependabot[bot])
chore(deps): bump docker/login-action from 3 to 4 #3638 (dependabot[bot])
deps: update to latest iceberg-rust to pick up get_byte_ranges [iceberg] #3635 (mbutrovich)
chore: Array literals tests enable #3633 (comphead)
chore: Add debug assertions before unsafe code blocks #3655 (andygrove)
chore: fix license header - ansi docs #3662 (coderfender)
chore(deps): bump quinn-proto from 0.11.13 to 0.11.14 in /native #3660 (dependabot[bot])
ci: add dedicated RAT license check workflow for all PRs #3664 (andygrove)
chore: Remove deprecated SCAN_NATIVE_COMET constant and related test code #3671 (andygrove)
chore: Upgrade to DF 52.3.0 #3672 (andygrove)
deps: update to iceberg-rust 0.9.0 rc1 [iceberg] #3657 (mbutrovich)
chore: Mark expressions with known correctness issues as incompatible #3675 (andygrove)
chore(deps): bump actions/setup-java from 4 to 5 #3683 (dependabot[bot])
chore(deps): bump runs-on/action from 2.0.3 to 2.1.0 #3684 (dependabot[bot])
chore(deps): bump actions/checkout from 4 to 6 #3685 (dependabot[bot])
ci: remove Java Iceberg integration tests from CI [iceberg] #3673 (andygrove)
Credits#
Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor.
81 Andy Grove
34 dependabot[bot]
19 Matt Butrovich
9 B Vadlamani
8 Oleks V
7 Kazantsev Maksim
4 Emily Matheys
4 Manu Zhang
4 Shekhar Prasad Rajak
2 Bhargava Vadlamani
2 ChenChen Lai
2 Dmitrii Blaginin
2 Kevin Liu
2 Parth Chandra
2 Peter Lee
2 hsiang-c
1 David López
1 Marko Milenković
1 Rafael Fernández
1 Tornike Gurgenidze
1 n0r0shi
Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.