DataFusion Comet 0.14.0 Changelog#

This release consists of 189 commits from 21 contributors. See credits at the end of this changelog for more information.

Fixed bugs:

  • fix: [iceberg] Fall back on dynamicpruning expressions for CometIcebergNativeScan #3335 (mbutrovich)

  • fix: [iceberg] Disable native c2r by default #3348 (andygrove)

  • fix: Fix space() with negative input #3347 (hsiang-c)

  • fix: respect scan impl config for v2 scan #3357 (andygrove)

  • fix: fix memory safety issue in native c2r #3367 (andygrove)

  • fix: preserve partitioning in CometNativeScanExec for bucketed scans #3392 (andygrove)

  • fix: unignore row index Spark SQL tests for native_datafusion #3414 (andygrove)

  • fix: fall back to Spark when Parquet field ID matching is enabled in native_datafusion #3415 (andygrove)

  • fix: Expose bucketing information from CometNativeScanExec #3437 (andygrove)

  • fix: support scalar processing for space function #3408 (kazantsev-maksim)

  • fix: Revert “perf: Remove mutable buffers from scan partition/missing columns (#3411)” [iceberg] #3486 (mbutrovich)

  • fix: unignore input_file_name Spark SQL tests for native_datafusion #3458 (andygrove)

  • fix: add scalar support for bit_count expression #3361 (hsiang-c)

  • fix: Support concat_ws with literal NULL separator #3542 (0lai0)

  • fix: handle type mismatches in native c2r conversion #3583 (andygrove)

  • fix: disable native C2R for legacy Iceberg scans [iceberg] #3663 (mbutrovich)

  • fix: resolve Miri UB in null struct field test, re-enable Miri on PRs #3669 (andygrove)

  • fix: Support on all-literal RLIKE expression #3647 (0lai0)

  • fix: Fix scan metrics test to run with both native_datafusion and native_iceberg_compat #3690 (andygrove)

Performance related:

  • perf: refactor sum int with specialized implementations for each eval_mode #3054 (andygrove)

  • perf: Optimize contains expression with SIMD-based scalar pattern sea… #2991 (Shekharrajak)

  • perf: Add batch coalescing in BufBatchWriter to reduce IPC schema overhead #3441 (andygrove)

  • perf: Use native_datafusion scan in benchmark scripts (6% faster for TPC-H) #3460 (andygrove)

  • perf: Remove mutable buffers from scan partition/missing columns #3411 (andygrove)

  • perf: [iceberg] Single-pass FileScanTask validation #3443 (mbutrovich)

  • perf: Improve benchmarks for native row-to-columnar used by JVM shuffle #3290 (andygrove)

  • perf: executePlan uses a channel to park executor task thread instead of yield_now() [iceberg] #3553 (mbutrovich)

  • perf: Initialize tokio runtime worker threads from spark.executor.cores #3555 (andygrove)

  • perf: Add Comet config for native Iceberg reader’s data file concurrency [iceberg] #3584 (mbutrovich)

  • perf: reuse CometConf.COMET_TRACING_ENABLED, Native, NativeUtil in NativeBatchDecoderIterator #3627 (mbutrovich)

  • perf: Improve performance of native row-to-columnar transition used by JVM shuffle #3289 (andygrove)

  • perf: use aligned pointer reads for SparkUnsafeRow field accessors #3670 (andygrove)

  • perf: Optimize some decimal expressions #3619 (andygrove)

Implemented enhancements:

  • feat: Native columnar to row conversion (Phase 2) #3266 (andygrove)

  • feat: Enable native columnar-to-row by default #3299 (andygrove)

  • feat: add support for width_bucket expression #3273 (davidlghellin)

  • feat: Drop native_comet as a valid option for COMET_NATIVE_SCAN_IMPL config #3358 (andygrove)

  • feat: Support date to timestamp cast #3383 (coderfender)

  • feat: CometExecRDD supports per-partition plan data, reduce Iceberg native scan serialization, add DPP [iceberg] #3349 (mbutrovich)

  • feat: Support right expression #3207 (Shekharrajak)

  • feat: support map_contains_key expression #3369 (peterxcli)

  • feat: add support for make_date expression #3147 (andygrove)

  • feat: add support for next_day expression #3148 (andygrove)

  • feat: implement cast from whole numbers to binary format and bool to decimal #3083 (coderfender)

  • feat: Support for StringSplit #2772 (Shekharrajak)

  • feat: CometNativeScan per-partition plan serde #3511 (mbutrovich)

  • feat: Remove mutable buffers from scan partition/missing columns [iceberg] #3514 (andygrove)

  • feat: pass spark.comet.datafusion.* configs through to DataFusion session #3455 (andygrove)

  • feat: pass vended credentials to Iceberg native scan #3523 (tokoko)

  • feat: Cast date to Numeric (No Op) #3544 (coderfender)

  • feat: add support crc32 expression #3498 (rafafrdz)

  • feat: Support int to timestamp casts #3541 (coderfender)

  • feat(benchmarks): add async-profiler support to TPC benchmark scripts #3613 (andygrove)

  • feat: Cast numeric (non int) to timestamp #3559 (coderfender)

  • feat: [ANSI] Ansi sql error messages #3580 (parthchandra)

  • feat: enable debug assertions in CI profile, fix unaligned memory access bug #3652 (andygrove)

  • feat: Enable native c2r by default, add debug asserts #3649 (andygrove)

  • feat: support Spark luhn_check expression #3573 (n0r0shi)

Documentation updates:

  • docs: Add changelog for 0.13.0 #3260 (andygrove)

  • docs: fix bug in placement of prettier-ignore-end in generated docs #3287 (andygrove)

  • docs: Add contributor guide page for SQL file tests #3333 (andygrove)

  • docs: fix inaccurate claim about mutable buffers in parquet scan docs #3378 (andygrove)

  • docs: Improve documentation on maven usage for running tests #3370 (andygrove)

  • docs: move release process docs to contributor guide #3492 (andygrove)

  • docs: improve release process documentation #3508 (andygrove)

  • docs: update roadmap #3543 (mbutrovich)

  • docs: Update Parquet scan documentation #3433 (andygrove)

  • docs: recommend SQL file tests for new expressions #3598 (andygrove)

  • docs: add SAFETY comments to all unsafe blocks in shuffle spark_unsafe module #3603 (andygrove)

  • docs: Fix link to overview page #3625 (manuzhang)

  • doc: Document sql query error propagation #3651 (parthchandra)

  • docs: update Iceberg docs in advance of 0.14.0 #3691 (mbutrovich)

Other:

  • chore(deps): bump actions/download-artifact from 4 to 7 #3281 (dependabot[bot])

  • chore(deps): bump cc from 1.2.53 to 1.2.54 in /native #3284 (dependabot[bot])

  • build: Fix docs workflow dependency resolution failure #3275 (andygrove)

  • chore(deps): bump actions/upload-artifact from 4 to 6 #3280 (dependabot[bot])

  • chore(deps): bump actions/cache from 4 to 5 #3279 (dependabot[bot])

  • chore(deps): bump uuid from 1.19.0 to 1.20.0 in /native #3282 (dependabot[bot])

  • build: reduce overhead of fuzz testing #3257 (andygrove)

  • chore: Start 0.14.0 development #3288 (andygrove)

  • chore: Add Comet released artifacts and links to maven #3291 (comphead)

  • chore: Add take/untake workflow for issue self-assignment #3270 (andygrove)

  • ci: Consolidate Spark SQL test jobs to reduce CI time #3271 (andygrove)

  • chore(deps): bump org.assertj:assertj-core from 3.23.1 to 3.27.7 #3293 (dependabot[bot])

  • chore: Add microbenchmark for IcebergScan operator serde roundtrip #3296 (andygrove)

  • chore: Remove IgnoreCometNativeScan from ParquetEncryptionSuite in 3.5.7 diff #3304 (andygrove)

  • chore: Enable native c2r in plan stability suite #3302 (andygrove)

  • chore: Add support for Spark 3.5.8 #3323 (manuzhang)

  • chore: Invert usingDataSourceExec test helper to usingLegacyNativeCometScan #3310 (andygrove)

  • tests: Add SQL test files covering edge cases for (almost) every Comet-supported expression #3328 (andygrove)

  • chore: Adapt caching from #3251 to [iceberg] workflows #3353 (mbutrovich)

  • bug: Fix string decimal type throw right exception #3248 (coderfender)

  • chore: Migrate concat tests to sql based testing framework #3352 (andygrove)

  • chore(deps): bump actions/setup-java from 4 to 5 #3363 (dependabot[bot])

  • chore: Annotate classes/methods/fields that are used by Apache Iceberg #3237 (andygrove)

  • Feat: map_from_entries #2905 (kazantsev-maksim)

  • chore: Move spark unsafe classes into spark_unsafe #3373 (EmilyMatt)

  • chore: Extract some tied down logic #3374 (EmilyMatt)

  • Fix: array contains null handling #3372 (Shekharrajak)

  • chore: stop uploading code coverage results #3381 (andygrove)

  • chore: update target-cpus in published binaries to x86-64-v3 and neoverse-n1 #3368 (mbutrovich)

  • chore: show line of error sql #3390 (peterxcli)

  • chore: Move writer-related logic to “writers” module #3385 (EmilyMatt)

  • chore(deps): bump bytes from 1.11.0 to 1.11.1 in /native #3380 (dependabot[bot])

  • chore: Clean up and split shuffle module #3395 (EmilyMatt)

  • chore: Make PR workflows match target-cpu flags in published jars #3402 (mbutrovich)

  • chore(deps): bump time from 0.3.45 to 0.3.47 in /native #3412 (dependabot[bot])

  • chore: Run Spark SQL tests with native_datafusion in CI #3393 (andygrove)

  • test: Add ANSI mode SQL test files for expressions that throw on invalid input #3377 (andygrove)

  • refactor: Split read benchmarks and add addParquetScanCases helper #3407 (andygrove)

  • chore: 4.5x reduction in number of golden files #3399 (andygrove)

  • Feat: to_csv #3004 (kazantsev-maksim)

  • minor: map_from_entries sql tests #3394 (kazantsev-maksim)

  • chore: add confirmation before tarball is released #3439 (milenkovicm)

  • chore(deps): bump cc from 1.2.54 to 1.2.55 in /native #3451 (dependabot[bot])

  • chore: Add Iceberg TPC-H benchmarking scripts #3294 (andygrove)

  • chore: Remove dead code paths for deprecated native_comet scan #3396 (andygrove)

  • chore(deps): bump arrow from 57.2.0 to 57.3.0 in /native #3449 (dependabot[bot])

  • chore(deps): bump aws-config from 1.8.12 to 1.8.13 in /native #3450 (dependabot[bot])

  • chore(deps): bump regex from 1.12.2 to 1.12.3 in /native #3453 (dependabot[bot])

  • chore(deps): bump rand from 0.9.2 to 0.10.0 in /native #3465 (manuzhang)

  • test: Add additional contains expression tests #3462 (andygrove)

  • chore: Adjust native artifact caching key in CI #3476 (mbutrovich)

  • chore: Add Comet writer nested types test assertion #3480 (comphead)

  • test: Add SQL file tests for left and right expressions #3463 (andygrove)

  • chore: Add GitHub workflow to close stale PRs #3488 (andygrove)

  • chore: Make push CI to be triggered for main branch only #3474 (comphead)

  • ci: disable Miri safety checks until compatibility is restored #3504 (andygrove)

  • chore: Add memory reservation debug logging #3489 (andygrove)

  • chore: enable GitHub button for updating PR branches with latest from main #3505 (andygrove)

  • chore: remove some dead cast code #3513 (andygrove)

  • chore(deps): bump aws-credential-types from 1.2.11 to 1.2.12 in /native #3525 (dependabot[bot])

  • chore(deps): bump libc from 0.2.180 to 0.2.182 in /native #3527 (dependabot[bot])

  • chore(deps): bump cc from 1.2.55 to 1.2.56 in /native #3528 (dependabot[bot])

  • chore(deps): bump tempfile from 3.24.0 to 3.25.0 in /native #3529 (dependabot[bot])

  • ci: Bump up actions/upload-artifact from v4 to v6 #3533 (manuzhang)

  • chore(deps): bump aws-config from 1.8.13 to 1.8.14 in /native #3526 (dependabot[bot])

  • chore: refactor array_repeat #3516 (kazantsev-maksim)

  • chore: Add envvars to override writer configs and cometConf minor clean up #3540 (comphead)

  • chore: Cast module refactor boolean module #3491 (coderfender)

  • chore: Consolidate TPC benchmark scripts #3538 (andygrove)

  • chore(deps): bump parquet from 57.2.0 to 57.3.0 in /native #3568 (dependabot[bot])

  • chore(deps): bump uuid from 1.20.0 to 1.21.0 in /native #3567 (dependabot[bot])

  • chore: Add TPC-* queries to repo #3562 (andygrove)

  • chore(deps): bump assertables from 9.8.4 to 9.8.6 in /native #3570 (dependabot[bot])

  • chore(deps): bump actions/stale from 10.1.1 to 10.2.0 #3565 (dependabot[bot])

  • chore(deps): bump aws-credential-types from 1.2.12 to 1.2.13 in /native #3566 (dependabot[bot])

  • chore: makes dependabot to group deps into single PR #3578 (comphead)

  • chore: Cast module refactor : String #3577 (coderfender)

  • chore(deps): bump the all-other-cargo-deps group in /native with 3 updates #3581 (dependabot[bot])

  • chore: Add Docker Compose support for TPC benchmarks #3576 (andygrove)

  • build: Runs-on for PR Build (Linux) #3579 (blaginin)

  • chore: Add consistency checks and result hashing to TPC benchmarks #3582 (andygrove)

  • chore: Remove all remaining uses of legacy BatchReader from Comet [iceberg] #3468 (andygrove)

  • build: Skip CI workflows for changes in benchmarks directory #3599 (andygrove)

  • build: fix runs-on tags for consistency #3601 (andygrove)

  • chore: Add Java Flight Recorder profiling to TPC benchmarks #3597 (andygrove)

  • deps: DataFusion 52.0.0 migration (SchemaAdapter changes, etc.) [iceberg] #3536 (comphead)

  • chore(deps): bump actions/download-artifact from 7 to 8 #3609 (dependabot[bot])

  • chore(deps): bump actions/upload-artifact from 6 to 7 #3610 (dependabot[bot])

  • chore: bump iceberg-rust dependency to latest [iceberg] #3606 (mbutrovich)

  • CI: Add CodeQL workflow for GitHub Actions security scanning #3617 (kevinjqliu)

  • CI: update codeql with pinned action versions #3621 (kevinjqliu)

  • chore: replace legacy datetime rebase tests with current scan coverage [iceberg] #3605 (andygrove)

  • build: More runners #3626 (blaginin)

  • deps: bump DataFusion to 52.2 [iceberg] #3622 (mbutrovich)

  • chore: use datafusion impl of space function #3612 (kazantsev-maksim)

  • chore: use datafusion impl of bit_count function #3616 (kazantsev-maksim)

  • chore: refactor cast module numeric data types #3623 (coderfender)

  • chore: Refactor cast module temporal types #3624 (coderfender)

  • chore: Fix clippy complaints #3634 (comphead)

  • chore(deps): bump docker/build-push-action from 6 to 7 #3639 (dependabot[bot])

  • chore(deps): bump github/codeql-action from 4.32.5 to 4.32.6 #3637 (dependabot[bot])

  • chore(deps): bump docker/setup-buildx-action from 3 to 4 #3636 (dependabot[bot])

  • chore(deps): bump docker/login-action from 3 to 4 #3638 (dependabot[bot])

  • deps: update to latest iceberg-rust to pick up get_byte_ranges [iceberg] #3635 (mbutrovich)

  • chore: Array literals tests enable #3633 (comphead)

  • chore: Add debug assertions before unsafe code blocks #3655 (andygrove)

  • chore: fix license header - ansi docs #3662 (coderfender)

  • chore(deps): bump quinn-proto from 0.11.13 to 0.11.14 in /native #3660 (dependabot[bot])

  • ci: add dedicated RAT license check workflow for all PRs #3664 (andygrove)

  • chore: Remove deprecated SCAN_NATIVE_COMET constant and related test code #3671 (andygrove)

  • chore: Upgrade to DF 52.3.0 #3672 (andygrove)

  • deps: update to iceberg-rust 0.9.0 rc1 [iceberg] #3657 (mbutrovich)

  • chore: Mark expressions with known correctness issues as incompatible #3675 (andygrove)

  • chore(deps): bump actions/setup-java from 4 to 5 #3683 (dependabot[bot])

  • chore(deps): bump runs-on/action from 2.0.3 to 2.1.0 #3684 (dependabot[bot])

  • chore(deps): bump actions/checkout from 4 to 6 #3685 (dependabot[bot])

  • ci: remove Java Iceberg integration tests from CI [iceberg] #3673 (andygrove)

Credits#

Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor.

    81	Andy Grove
    34	dependabot[bot]
    19	Matt Butrovich
     9	B Vadlamani
     8	Oleks V
     7	Kazantsev Maksim
     4	Emily Matheys
     4	Manu Zhang
     4	Shekhar Prasad Rajak
     2	Bhargava Vadlamani
     2	ChenChen Lai
     2	Dmitrii Blaginin
     2	Kevin Liu
     2	Parth Chandra
     2	Peter Lee
     2	hsiang-c
     1	David López
     1	Marko Milenković
     1	Rafael Fernández
     1	Tornike Gurgenidze
     1	n0r0shi

Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.