DataFusion Comet 0.8.0 Changelog#
This release consists of 81 commits from 11 contributors. See credits at the end of this changelog for more information.
Fixed bugs:
fix: remove code duplication in native_datafusion and native_iceberg_compat implementations #1443 (parthchandra)
fix: Refactor CometScanRule and fix bugs #1483 (andygrove)
fix: check if handle has been initialized before closing #1554 (wForget)
fix: Taking slicing into account when writing BooleanBuffers as fast-encoding format #1522 (Kontinuation)
fix: isCometEnabled name conflict #1569 (kazuyukitanimura)
fix: make register_object_store use same session_env as file scan #1555 (wForget)
fix: adjust CometNativeScan’s doCanonicalize and hashCode for AQE, use DataSourceScanExec trait #1578 (mbutrovich)
fix: corrected the logic of eliminating CometSparkToColumnarExec #1597 (wForget)
fix: avoid panic caused by close null handle of parquet reader #1604 (wForget)
fix: Make AQE capable of converting Comet shuffled joins to Comet broadcast hash joins #1605 (Kontinuation)
fix: Making shuffle files generated in native shuffle mode reclaimable #1568 (Kontinuation)
fix: Support per-task shuffle write rows and shuffle write time metrics #1617 (Kontinuation)
fix: Modify Spark SQL core 2 tests for
native_datafusionreader, change 3.5.5 diff hash length to 11 #1641 (mbutrovich)fix: fix spark/sql test failures in native_iceberg_compat #1593 (parthchandra)
fix: handle missing field correctly in native_iceberg_compat #1656 (parthchandra)
fix: better int96 support for experimental native scans #1652 (mbutrovich)
fix: respect
ignoreNullsflag infirst_valueandlast_value#1626 (andygrove)fix: update row groups count in internal metrics accumulator #1658 (parthchandra)
fix: Shuffle should maintain insertion order #1660 (EmilyMatt)
Performance related:
perf: Use a global tokio runtime #1614 (andygrove)
perf: Respect Spark’s PARQUET_FILTER_PUSHDOWN_ENABLED config #1619 (andygrove)
perf: Experimental fix to avoid join strategy regression #1674 (andygrove)
Implemented enhancements:
feat: add read array support #1456 (comphead)
feat: introduce hadoop mini cluster to test native scan on hdfs #1556 (wForget)
feat: make parquet native scan schema case insensitive #1575 (wForget)
feat: enable iceberg compat tests, more tests for complex types #1550 (comphead)
feat: pushdown filter for native_iceberg_compat #1566 (wForget)
feat: Fix struct of arrays schema issue #1592 (comphead)
feat: adding more struct/arrays tests #1594 (comphead)
feat: respect
batchSize/workerThreads/blockingThreadsconfigurations for native_iceberg_compat scan #1587 (wForget)feat: add MAP type support for first level #1603 (comphead)
feat: Add more tests for nested types combinations for
native_datafusion#1632 (comphead)feat: Override MapBuilder values field with expected schema #1643 (comphead)
feat: track unified memory pool #1651 (wForget)
feat: Add support for complex types in native shuffle #1655 (andygrove)
Documentation updates:
docs: Update configuration guide to show optional configs #1524 (andygrove)
docs: Add changelog for 0.7.0 release #1527 (andygrove)
docs: Use a shallow clone for Spark SQL test instructions #1547 (mbutrovich)
docs: Update benchmark results for 0.7.0 release #1548 (andygrove)
doc: Renew
kubernetes.md#1549 (comphead)docs: various improvements to tuning guide #1525 (andygrove)
docs: Update supported Spark versions #1580 (andygrove)
docs: change OSX/OS X to macOS #1584 (mbutrovich)
docs: docs for benchmarking in aws ec2 #1601 (andygrove)
docs: Update compatibility docs for new native scans #1657 (andygrove)
doc: Document local HDFS setup #1673 (comphead)
Other:
chore: fix issue in release process #1528 (andygrove)
chore: Remove all subdependencies #1514 (EmilyMatt)
chore: Drop support for Spark 3.3 (EOL) #1529 (andygrove)
chore: Prepare for 0.8.0 development #1530 (andygrove)
chore: Re-enable GitHub discussions #1535 (andygrove)
chore: [FOLLOWUP] Drop support for Spark 3.3 (EOL) #1534 (kazuyukitanimura)
build: Use unique name for surefire artifacts #1544 (andygrove)
chore: Update links for released version #1540 (andygrove)
chore: Enable Comet explicitly in
CometTPCDSQueryTestSuite#1559 (andygrove)chore: Fix some inconsistencies in memory pool configuration #1561 (andygrove)
upgraded spark 3.5.4 to 3.5.5 #1565 (YanivKunda)
minor: fix typo #1570 (wForget)
Chore: simplify array related functions impl #1490 (kazantsev-maksim)
added fallback using reflection for backward-compatibility #1573 (YanivKunda)
chore: Override node name for CometSparkToColumnar #1577 (l0kr)
chore: Reimplement ShuffleWriterExec using interleave_record_batch #1511 (Kontinuation)
chore: Run Comet tests for more Spark versions #1582 (andygrove)
Feat: support array_except function #1343 (kazantsev-maksim)
minor: Fix clippy warnings #1606 (Kontinuation)
chore: Remove some unwraps in hashing code #1600 (andygrove)
chore: Remove redundant shims for getFailOnError #1608 (andygrove)
chore: Making comet native operators write spill files to spark local dir #1581 (Kontinuation)
chore: Refactor QueryPlanSerde to use idiomatic Scala and reduce verbosity #1609 (andygrove)
chore: Create simple fuzz test as part of test suite #1610 (andygrove)
chore: Document
testSingleLineQuerytest method #1628 (comphead)chore: Parquet fuzz testing #1623 (andygrove)
chore: Change default Spark version to 3.5 #1620 (andygrove)
chore: Add manually-triggered CI jobs for testing Spark SQL with native scans #1624 (andygrove)
chore: refactor v2 scan conversion #1621 (andygrove)
chore: clean up
planner.rs#1650 (comphead)chore: correct name of pipelines for native_datafusion ci workflow #1653 (parthchandra)
chore: Upgrade to datafusion 47.0.0-rc1 and arrow-rs 55.0.0 #1563 (andygrove)
chore: Upgrade to datafusion 47.0.0 #1663 (YanivKunda)
chore: Enable CometFuzzTestSuite int96 test for experimental native scans (without complex types) #1664 (mbutrovich)
chore: Refactor Memory Pools #1662 (EmilyMatt)
Credits#
Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor.
31 Andy Grove
11 Oleks V
10 Zhen Wang
7 Kristin Cowalcijk
6 Matt Butrovich
5 Parth Chandra
3 Emily Matheys
3 Yaniv Kunda
2 KAZUYUKI TANIMURA
2 Kazantsev Maksim
1 Łukasz
Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.