DataFusion Comet 0.7.0 Changelog#
This release consists of 46 commits from 11 contributors. See credits at the end of this changelog for more information.
Fixed bugs:
fix: Change default value of COMET_SCAN_ALLOW_INCOMPATIBLE and add documentation #1398 (andygrove)
fix: Reduce cast.rs and utils.rs logic from parquet_support.rs for experimental native scans #1387 (mbutrovich)
fix: Remove more cast.rs logic from parquet_support.rs for experimental native scans #1413 (mbutrovich)
fix: fix various unit test failures in native_datafusion and native_iceberg_compat readers #1415 (parthchandra)
fix: metrics tests for native_datafusion experimental native scan #1445 (mbutrovich)
fix: Reduce number of shuffle spill files, fix spilled_bytes metric, add some unit tests #1440 (andygrove)
fix: Executor memory overhead overriding #1462 (LukMRVC)
fix: Stop copying rust-toolchain to docker file #1475 (andygrove)
fix: PartitionBuffers should not have their own MemoryConsumer #1496 (EmilyMatt)
fix: enable full decimal to decimal support #1385 (himadripal)
fix: use common implementation of handling object store and hdfs urls for native_datafusion and native_iceberg_compat #1494 (parthchandra)
fix: Simplify CometShuffleMemoryAllocator logic, rename classes, remove config #1485 (mbutrovich)
fix: check overflow for decimal integral division #1512 (wForget)
Performance related:
perf: Update RewriteJoin logic to choose optimal build side #1424 (andygrove)
perf: Reduce native shuffle memory overhead by 50% #1452 (andygrove)
Implemented enhancements:
feat: CometNativeScan metrics from ParquetFileMetrics and FileStreamMetrics #1172 (mbutrovich)
feat: add experimental remote HDFS support for native DataFusion reader #1359 (comphead)
feat: add Win-amd64 profile #1410 (wForget)
feat: Support IntegralDivide function #1428 (wForget)
feat: Add div operator for fuzz testing and update expression doc #1464 (wForget)
feat: Upgrade to DataFusion 46.0.0-rc2 #1423 (andygrove)
feat: Add support for rpad #1470 (andygrove)
feat: Use official DataFusion 46.0.0 release #1484 (andygrove)
Documentation updates:
docs: Add changelog for 0.6.0 release #1402 (andygrove)
docs: Improve documentation for running stability plan tests #1469 (andygrove)
Other:
test: Add experimental native scans to CometReadBenchmark #1150 (mbutrovich)
chore: Prepare for 0.7.0 development #1404 (andygrove)
chore: Update released version in documentation #1418 (andygrove)
chore: Update protobuf to 3.25.5 #1434 (kazuyukitanimura)
chore: Update guava to 33.2.1-jre #1435 (kazuyukitanimura)
test: Register Spark-compatible expressions with a DataFusion context #1432 (viczsaurav)
chore: fixes for kube build #1421 (comphead)
build: pin machete to version 0.7.0 #1444 (andygrove)
chore: Re-organize shuffle writer code #1439 (andygrove)
chore: faster maven mirror #1447 (comphead)
build: Use stable channel in rust-toolchain #1465 (andygrove)
Feat: support array_compact function #1321 (kazantsev-maksim)
chore: Upgrade to Spark 3.5.4 #1471 (andygrove)
chore: Enable CI checks for
native_datafusionscan #1479 (andygrove)chore: Add
native_iceberg_compatCI checks #1487 (andygrove)chore: Stop disabling readside padding in TPC stability suite #1491 (andygrove)
chore: Remove num partitions from repartitioner #1498 (EmilyMatt)
test: fix Spark 3.5 tests #1482 (kazuyukitanimura)
minor: Remove hard-coded config default #1503 (andygrove)
chore: Use Datafusion’s existing empty stream #1517 (EmilyMatt)
Credits#
Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor.
20 Andy Grove
6 Matt Butrovich
4 Zhen Wang
3 Emily Matheys
3 KAZUYUKI TANIMURA
3 Oleks V
2 Himadri Pal
2 Parth Chandra
1 Kazantsev Maksim
1 Lukas Moravec
1 Saurav Verma
Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.