DataFusion Comet 0.11.0 Changelog#

This release consists of 131 commits from 15 contributors. See credits at the end of this changelog for more information.

Fixed bugs:

  • fix: temporarily ignore test for hdfs file systems #2359 (parthchandra)

  • fix: Check reused broadcast plan in non-AQE and make setNumPartitions thread safe #2398 (wForget)

  • fix: correct missingInput for CometHashAggregateExec #2409 (comphead)

  • fix:clippy errros rust 1.9.0 update #2419 (coderfender)

  • fix: Avoid spark plan execution cache preventing CometBatchRDD numPartitions change #2420 (wForget)

  • fix: regressions in CometToPrettyStringSuite #2384 (hsiang-c)

  • fix: Byte array Literals failed on cast #2432 (comphead)

  • fix: Do not push down subquery filters on native_datafusion scan #2438 (wForget)

  • fix: Improve error handling when resolving S3 bucket region #2440 (andygrove)

  • fix: [iceberg] additional parquet independent api for iceberg integration #2442 (parthchandra)

  • fix: Specify reqwest crate features #2446 (andygrove)

  • fix: distributed RangePartitioning bounds calculation with native shuffle #2258 (mbutrovich)

  • fix: fix regression in tpcbench.py #2512 (andygrove)

  • fix: [iceberg] Close reader instance in ReadConf #2510 (hsiang-c)

  • fix: Enable plan stability tests for auto scan #2516 (andygrove)

  • fix: Capture unexpected output when retrieving JVM 17 args in Makefile #2566 (zuston)

Performance related:

  • perf: New Configuration from shared conf to avoid high costs #2402 (wForget)

  • perf: Use DataFusion’s count_udaf instead of SUM(IF(expr IS NOT NULL, 1, 0)) #2407 (andygrove)

  • perf: Improve BroadcastExchangeExec conversion #2417 (wForget)

Implemented enhancements:

  • feat: Add dynamic enabled and allowIncompat configs for all supported expressions #2329 (andygrove)

  • feat: feature specific tests #2372 (parthchandra)

  • feat: Support more date part expressions #2316 (wForget)

  • feat: rpad support column for second arg instead of just literal #2099 (coderfender)

  • feat: Support comet native log level conf #2379 (wForget)

  • feat: Enable WeekDay function #2411 (wForget)

  • feat: Add nested Array literal support #2181 (comphead)

  • feat:add_additional_char_support_rpad #2436 (coderfender)

  • feat: do not fallback to Spark for COUNT(distinct) #2429 (comphead)

  • feat: implement_ansi_eval_mode_arithmetic #2136 (coderfender)

  • feat: Add plan conversion statistics to extended explain info #2412 (andygrove)

  • feat: implement_comet_native_lpad_expr #2102 (coderfender)

  • feat: Add backtrace feature to simplify enabling native backtraces in CometNativeException #2515 (andygrove)

  • feat: Support reverse function with ArrayType input #2481 (cfmcgrady)

  • feat: Change default off-heap memory pool from greedy_unified to fair_unified #2526 (andygrove)

  • feat: Make DiskManager max_temp_directory_size configurable #2479 (manuzhang)

  • feat: Parquet Modular Encryption with Spark KMS for native readers #2447 (mbutrovich)

  • feat: Add support for Spark-compatible cast from integral to decimal #2472 (coderfender)

  • feat:Support ANSI mode integral divide #2421 (coderfender)

  • feat: Add config to enable running Comet in onheap mode #2554 (andygrove)

  • feat:support ansi mode rounding function #2542 (coderfender)

  • feat:support ansi mode remainder function #2556 (coderfender)

  • feat: Implement array-to-string cast support #2425 (cfmcgrady)

  • feat: Various improvements to memory pool configuration, logging, and documentation #2538 (andygrove)

  • feat: Enable complex types for columnar shuffle #2573 (mbutrovich)

  • feat: support_decimal_types_bool_cast_native_impl #2490 (coderfender)

  • feat: Use buf write to reduce system call on index write #2579 (zuston)

Documentation updates:

  • doc: Document usage IcebergCometBatchReader.java #2347 (comphead)

  • docs: Add changelog for 0.10.0 release #2361 (andygrove)

  • docs: Fix error in docs #2373 (andygrove)

  • docs: Fix more comet versions in docs #2374 (andygrove)

  • docs: Publish 0.10.0 user guide #2394 (andygrove)

  • doc: macos benches doc clarifications #2418 (comphead)

  • docs: update configs.md after #2422 #2428 (mbutrovich)

  • docs: update docs and tuning guide related to native shuffle #2487 (mbutrovich)

  • docs: Improve EC2 benchmarking guide #2474 (andygrove)

  • docs: docs_update_ansi_support #2496 (coderfender)

  • docs:support lpad expression documentation update #2517 (coderfender)

  • docs: doc changes to support ANSI mode integral divide #2570 (coderfender)

  • docs: Split configuration guide into different sections (scan, exec, shuffle, etc) #2568 (andygrove)

  • docs: doc update to support ANSI mode remainder function #2576 (coderfender)

  • docs: Documentation updates #2581 (andygrove)

Other:

  • chore(deps): bump uuid from 1.18.0 to 1.18.1 in /native #2336 (dependabot[bot])

  • build: Check that all Scala test suites run in PR builds #2304 (andygrove)

  • chore: Start 0.11.0 development #2365 (andygrove)

  • chore: Split expression serde hash map into separate categories #2322 (andygrove)

  • chore: exclude Iceberg diffs from rat checks #2376 (hsiang-c)

  • chore: Refactor UnaryMinus serde #2378 (andygrove)

  • chore: Revert “chore: [1941-Part1]: Introduce map_sort scalar function (#2… #2381 (comphead)

  • chore: Refactor Literal serde #2377 (andygrove)

  • chore: Output BaseAggregateExec accurate unsupported names #2383 (comphead)

  • chore: Improve Initcap test and docs #2387 (andygrove)

  • build: fix build of ‘hdfs-opendal’ feature for MacOS #2392 (parthchandra)

  • chore(deps): bump cc from 1.2.36 to 1.2.37 in /native #2399 (dependabot[bot])

  • chore: [iceberg] support Iceberg 1.9.1 #2386 (hsiang-c)

  • minor: Add deprecation notice to datafusion-comet-spark-expr crate #2405 (andygrove)

  • minor: Update benchmarking scripts to specify scan implementation #2403 (andygrove)

  • refactor: Scala hygiene - remove scala.collection.JavaConverters #2393 (hsiang-c)

  • chore: Improve test coverage for count aggregates #2406 (andygrove)

  • chore: upgrade to DataFusion 50.0.0, Arrow 56.1.0, Parquet 56.0.0 among others #2286 (mbutrovich)

  • chore: Support Spark 4.0.1 instead of 4.0.0 #2414 (andygrove)

  • chore: Respect native features env for cargo commands #2296 (wForget)

  • minor: Update TPC-DS microbenchmarks to remove “scan only” and “exec only” runs #2396 (andygrove)

  • minor: Add RDDScan to default value of sparkToColumnar.supportedOperatorList #2422 (wForget)

  • chore: new TPC-DS golden plans #2426 (mbutrovich)

  • chore: fix pr_build*.yml #2434 (comphead)

  • chore: Remove unused class #2437 (wForget)

  • chore(deps): bump cc from 1.2.37 to 1.2.38 in /native #2439 (dependabot[bot])

  • chore: add validate_workflows.yml #2441 (comphead)

  • test: potential native broadcast failure in scenarios with ReusedExhange #2167 (akupchinskiy)

  • chore: Improvements of fallback info #2450 (wForget)

  • chore: Upgrade Apache Release Audit Tool (RAT) to 0.16.1 #2451 (andygrove)

  • minor: Remove reference to SortExec deadlock issue that is now resolved #2464 (andygrove)

  • chore: Use checked operations when growing or shrinking unified memory pool #2455 (andygrove)

  • minor: Improve the log message of CometTestBase#checkCometOperators #2458 (cfmcgrady)

  • minor: Skip calculating per-task memory limit when in off-heap mode #2462 (andygrove)

  • Chore: Used DataFusion impl of bit_get function #2466 (kazantsev-maksim)

  • chore(deps): bump regex from 1.11.2 to 1.11.3 in /native #2483 (dependabot[bot])

  • chore: update TPS-DS plans after #2429 #2486 (mbutrovich)

  • chore(deps): bump thiserror from 2.0.16 to 2.0.17 in /native #2485 (dependabot[bot])

  • chore(deps): bump cc from 1.2.38 to 1.2.39 in /native #2484 (dependabot[bot])

  • chore: Support running specific benchmark query #2491 (comphead)

  • chore: Make CometColumnarToRowExec extends CometPlan #2460 (wForget)

  • chore: Update artifacts to 0.10.0 #2500 (comphead)

  • build: Stop caching libcomet in CI #2498 (andygrove)

  • chore: Upgrade Maven plugins #2494 (andygrove)

  • Chore: Used DataFusion impl of date_add and date_sub functions #2473 (kazantsev-maksim)

  • minor: include taskAttemptId in log messages #2467 (andygrove)

  • chore: Improve test assertions in plan stability suite #2505 (andygrove)

  • build: Add Spark 4.0 to release build script #2514 (parthchandra)

  • chore: Enable plan stability tests for native_iceberg_compat #2519 (andygrove)

  • chore(deps): bump parking_lot from 0.12.4 to 0.12.5 in /native #2530 (dependabot[bot])

  • chore(deps): bump cc from 1.2.39 to 1.2.40 in /native #2529 (dependabot[bot])

  • chore: Refactor serde for ArrayCompact and ArrayFilter #2536 (andygrove)

  • Chore: Fix Scala code warnings - common module #2527 (andy-hf-kwok)

  • chore: Refactor serde for CheckOverflow #2537 (andygrove)

  • build: Run scala tests against release build of native code #2541 (andygrove)

  • chore: Pass Comet configs to native createPlan #2543 (andygrove)

  • chore: Refactor serde for Length #2547 (andygrove)

  • chore: Include spark shim sources for spotless plugin and reformat #2557 (wForget)

  • chore(deps): bump opendal from 0.54.0 to 0.54.1 in /native #2559 (dependabot[bot])

  • chore: Finish moving Cast serde out of QueryPlanSerde #2550 (andygrove)

  • chore: Use cargo-nextest in CI #2546 (andygrove)

  • chore: Delete unused code #2565 (zuston)

  • chore: Improve plan comet transformation log #2564 (wForget)

  • chore(deps): bump cc from 1.2.40 to 1.2.41 in /native #2560 (dependabot[bot])

  • chore(deps): bump aws-credential-types from 1.2.6 to 1.2.7 in /native #2563 (dependabot[bot])

  • chore: Refactor serde for RegExpReplace #2548 (andygrove)

  • chore: use polymorphic map builders in shuffle. #2571 (ashdnazg)

  • chore: Move ToPrettyString serde into shim layer #2549 (andygrove)

  • chore(deps): bump DataFusion dependencies to 50.2.0, refresh Cargo.lock #2575 (mbutrovich)

Credits#

Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor.

    47	Andy Grove
    15	Zhen Wang
    14	B Vadlamani
    12	Oleks V
    11	dependabot[bot]
    10	Matt Butrovich
     5	Parth Chandra
     5	hsiang-c
     3	Fu Chen
     3	Junfan Zhang
     2	Kazantsev Maksim
     1	Artem Kupchinskiy
     1	Eshed Schacham
     1	Manu Zhang
     1	andy-hf-kwok

Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.