DataFusion Comet 0.12.0 Changelog#

This release consists of 105 commits from 13 contributors. See credits at the end of this changelog for more information.

Fixed bugs:

  • fix: Fix None.get in stringDecode when bin child cannot be converted #2606 (cfmcgrady)

  • fix: Update FuzzDataGenerator to produce dictionary-encoded string arrays & fix bugs that this exposes #2635 (andygrove)

  • fix: Fallback to Spark for lpad/rpad for unsupported arguments & fix negative length handling #2630 (andygrove)

  • fix: Mark SortOrder with floating-point as incompatible #2650 (andygrove)

  • fix: Fall back to Spark for trunc / date_trunc functions when format string is unsupported, or is not a literal value #2634 (andygrove)

  • fix: [native_datafusion] only pass single partition of PartitionedFiles into DataSourceExec #2675 (mbutrovich)

  • fix: Fix subcommands options in fuzz-testing #2684 (manuzhang)

  • fix: Do not replace SMJ with HJ for LeftSemi #2687 (comphead)

  • fix: Apply spotless on Iceberg 1.8.1 diff [iceberg] #2700 (hsiang-c)

  • fix: Fix generate-user-guide-reference-docs failure when mvn command is not executed at root #2691 (manuzhang)

  • fix: Fix missing SortOrder fallback reason in range partitioning #2716 (andygrove)

  • fix: CometLiteral class cast exception with arrays #2718 (andygrove)

  • fix: NormalizeNaNAndZero::children() returns child’s child #2732 (mbutrovich)

  • fix: checkSparkMaybeThrows should compare Spark and Comet results in success case #2728 (andygrove)

  • fix: Mark WindowsExec as incompatible #2748 (andygrove)

  • fix: Add strict floating point mode and fallback to Spark for min/max/sort on floating point inputs when enabled #2747 (andygrove)

  • fix: Implement producedAttributes for CometWindowExec #2789 (rahulbabarwal89)

  • fix: Pass all Comet configs to native plan #2801 (andygrove)

Implemented enhancements:

  • feat: Add option to write benchmark results to file #2640 (andygrove)

  • feat: Implement metrics for iceberg compat #2615 (EmilyMatt)

  • feat: Define function signatures in CometFuzz #2614 (andygrove)

  • feat: cherry-pick UUID conversion logic from #2528 #2648 (mbutrovich)

  • feat: support concat for strings #2604 (comphead)

  • feat: Add support for abs #2689 (andygrove)

  • feat: Support variadic function in CometFuzz #2682 (manuzhang)

  • feat: CometExecRule refactor: Unify CometNativeExec creation with Serde in CometOperatorSerde trait #2768 (andygrove)

  • feat: support cot #2755 (psvri)

  • feat: Add bash script to build and run fuzz testing #2686 (manuzhang)

  • feat: Add getSupportLevel to CometAggregateExpressionSerde trait #2777 (andygrove)

  • feat: Add CI check to ensure generated docs are in sync with code #2779 (andygrove)

  • feat: Add prettier enforcement #2783 (andygrove)

  • feat: hyperbolic trig functions #2784 (psvri)

  • feat: [iceberg] Native scan by serializing FileScanTasks to iceberg-rust #2528 (mbutrovich)

Documentation updates:

  • docs: Add changelog for 0.11.0 release #2585 (mbutrovich)

  • docs: Improve documentation layout #2587 (andygrove)

  • docs: Publish 0.11.0 user guide #2589 (andygrove)

  • docs: Put Comet logo in top nav bar, respect light/dark mode #2591 (andygrove)

  • docs: Improve main landing page #2593 (andygrove)

  • docs: Improve site navigation #2597 (andygrove)

  • docs: Update benchmark results #2596 (andygrove)

  • docs: Upgrade pydata-sphinx-theme to 0.16.1 #2602 (andygrove)

  • docs: Fix redirect #2603 (andygrove)

  • docs: Fix broken image link #2613 (andygrove)

  • docs: Add FFI docs to contributor guide #2668 (andygrove)

  • docs: Various documentation updates #2674 (andygrove)

  • docs: Add supported SortOrder expressions and fix a typo #2694 (andygrove)

  • docs: Minor docs update for running Spark SQL tests #2712 (andygrove)

  • docs: Update contributor guide for adding a new expression #2704 (andygrove)

  • docs: Documentation updates for LocalTableScan and WindowExec #2742 (andygrove)

  • docs: Typo fix #2752 (wForget)

  • docs: Categorize some configs as testing and add notes about known time zone issues #2740 (andygrove)

  • docs: Run prettier on all markdown files #2782 (andygrove)

  • docs: Ignore prettier formatting for generated tables #2790 (andygrove)

  • docs: Add new section to contributor guide, explaining how to add a new operator #2758 (andygrove)

Other:

  • chore: Start 0.12.0 development #2584 (mbutrovich)

  • chore: Bump Spark from 3.5.6 to 3.5.7 #2574 (cfmcgrady)

  • chore(deps): bump parquet from 56.0.0 to 56.2.0 in /native #2608 (dependabot[bot])

  • chore(deps): bump tikv-jemallocator from 0.6.0 to 0.6.1 in /native #2609 (dependabot[bot])

  • chore(deps): bump tikv-jemalloc-ctl from 0.6.0 to 0.6.1 in /native #2610 (dependabot[bot])

  • tests: FuzzDataGenerator instead of Parquet-specific generator #2616 (mbutrovich)

  • chore: Simplify on-heap memory configuration #2599 (andygrove)

  • Feat: Add sha1 function impl #2471 (kazantsev-maksim)

  • chore: Refactor Parquet/DataFrame fuzz data generators #2629 (andygrove)

  • chore: Remove needless from_raw calls #2638 (EmilyMatt)

  • chore: support DataFusion 50.3.0 #2605 (comphead)

  • chore(deps): bump actions/upload-artifact from 4 to 5 #2654 (dependabot[bot])

  • chore(deps): bump cc from 1.2.42 to 1.2.43 in /native #2653 (dependabot[bot])

  • chore(deps): bump actions/download-artifact from 5 to 6 #2652 (dependabot[bot])

  • chore: extract comparison into separate tool #2632 (comphead)

  • chore: Various improvements to checkSparkAnswer* methods in CometTestBase #2656 (andygrove)

  • chore: Remove code for unpacking dictionaries prior to FilterExec #2659 (andygrove)

  • chore: display schema for datasets being compared #2665 (comphead)

  • chore: Remove CopyExec #2663 (andygrove)

  • chore: Add extended explain plans to stability suite #2669 (andygrove)

  • chore(deps): bump aws-config from 1.8.8 to 1.8.10 in /native #2677 (dependabot[bot])

  • chore(deps): bump cc from 1.2.43 to 1.2.44 in /native #2678 (dependabot[bot])

  • chore: tpcbench output explain just once and formatted #2679 (comphead)

  • chore: Add tolerance for ComparisonTool #2699 (comphead)

  • chore: Expand test coverage for CometWindowsExec #2711 (comphead)

  • chore: generate Float/Double NaN #2695 (hsiang-c)

  • minor: Combine two CI workflows for Spark SQL tests #2727 (andygrove)

  • chore: Improve framework for specifying that configs can be set with env vars #2722 (andygrove)

  • chore: Rename COMET_EXPLAIN_VERBOSE_ENABLED to COMET_EXTENDED_EXPLAIN_FORMAT and change default #2644 (andygrove)

  • chore: Fallback to Spark for windows functions #2726 (comphead)

  • chore: Refactor operator serde - part 1 #2738 (andygrove)

  • Feat: Add CometLocalTableScanExec operator #2735 (kazantsev-maksim)

  • chore(deps): bump cc from 1.2.44 to 1.2.45 in /native #2750 (dependabot[bot])

  • chore(deps): bump aws-credential-types from 1.2.8 to 1.2.9 in /native #2751 (dependabot[bot])

  • chore: Operator serde refactor part 2 #2741 (andygrove)

  • chore: Fallback to Spark for array_reverse for array<binary> #2759 (comphead)

  • chore: [iceberg] test iceberg 1.10.0 #2709 (manuzhang)

  • chore: Add docs/comet-* to rat exclude list #2762 (manuzhang)

  • Chore: Refactor static invoke exprs #2671 (kazantsev-maksim)

  • minor: Small refactor for consistent serde for hash aggregate #2764 (andygrove)

  • minor: Move operator2Proto to CometExecRule #2767 (andygrove)

  • chore: various refactoring changes for iceberg [iceberg] #2680 (parthchandra)

  • chore: Refactor CometExecRule handling of sink operators #2771 (andygrove)

  • minor: Refactor to move window-specific code from QueryPlanSerde to CometWindowExec #2780 (andygrove)

  • chore: Remove many references to COMET_EXPR_ALLOW_INCOMPATIBLE #2775 (andygrove)

  • chore: Remove COMET_EXPR_ALLOW_INCOMPATIBLE config #2786 (andygrove)

  • chore: check missingInput for Comet plan nodes #2795 (comphead)

  • chore: Finish refactoring expression serde out of QueryPlanSerde #2791 (andygrove)

  • chore: Update docs to fix CI after #2784 #2799 (mbutrovich)

  • chore: Update q79 golden plan for Spark 4.0 after #2795 #2800 (mbutrovich)

  • Fix: Fix null handling in CometVector implementations #2643 (cfmcgrady)

Credits#

Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor.

    54	Andy Grove
    11	Oleks V
    10	dependabot[bot]
     9	Matt Butrovich
     6	Manu Zhang
     3	Fu Chen
     3	Kazantsev Maksim
     2	Emily Matheys
     2	Vrishabh
     2	hsiang-c
     1	Parth Chandra
     1	Zhen Wang
     1	rahulbabarwal89

Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.