DataFusion Comet 0.10.0 Changelog#
This release consists of 183 commits from 26 contributors. See credits at the end of this changelog for more information.
Fixed bugs:
fix: [Iceberg] Fix decimal corruption #1985 (andygrove)
fix: broken link in development.md #2024 (petern48)
fix: [iceberg] Add LogicalTypeAnnotation in ParquetColumnSpec #2000 (huaxingao)
fix: hdfs read into buffer fully #2031 (parthchandra)
fix: Refactor arithmetic serde and fix correctness issues with EvalMode::TRY #2018 (andygrove)
fix: clean up [iceberg] integration APIs #2032 (huaxingao)
fix: zero Arrow Array offset before sending across FFI #2052 (mbutrovich)
fix: [iceberg] more fixes for Iceberg integration APIs. #2078 (parthchandra)
fix: Add support for StringDecode in Spark 4.0.0 #2075 (peter-toth)
fix: Avoid double free in CometUnifiedShuffleMemoryAllocator #2122 (andygrove)
fix: Remove duplicate serde code #2098 (andygrove)
fix: Improve logic for determining when an UnpackOrDeepCopy is needed #2142 (andygrove)
fix: Add CopyExec to inputs to SortMergeJoinExec #2155 (andygrove)
fix: Fix repeatedly url-decode path when reading parquet from s3 using native parquet reader #2138 (Kontinuation)
fix: [iceberg] Switch to OSS Spark and run Iceberg Spark tests in parallel #1987 (hsiang-c)
fix: [iceberg] Fall back to spark for schemas with empty structs #2204 (andygrove)
fix: Fix failing TPC-DS workflow in PR CI runs #2207 (andygrove)
fix: [iceberg] order query result deterministically #2208 (hsiang-c)
fix: use
spark.comet.batchSizeinstead ofconf.arrowMaxRecordsPerBatchfor data that is coming from Java #2196 (rluvaton)fix: if expr nullable #2217 (Asura7969)
fix: Support
autoscan mode with Spark 4.0.0 #1975 (andygrove)fix: Make Sha2 fallback message more user-friendly #2213 (rishvin)
fix: separate type checking for CometExchange and CometColumnarExchange #2241 (mbutrovich)
fix: Fix potential resource leak in native shuffle block reader #2247 (andygrove)
fix: Remove unreachable code in
CometScanRule#2252 (andygrove)fix: Fall back to
native_cometfor encrypted Parquet scans #2250 (andygrove)fix: Fall back to
native_cometwhen object store not supported bynative_iceberg_compat#2251 (andygrove)fix: split expr.proto file (new) #2267 (kination)
fix: handle cast to dictionary vector introduced by case when #2044 (parthchandra)
fix: Remove check for custom S3 endpoints #2288 (andygrove)
fix: implement lazy evaluation in Coalesce function #2270 (coderfender)
fix: Update benchmarking scripts #2293 (andygrove)
fix: Fix regression in NativeConfigSuite #2299 (andygrove)
fix: Validating object store configs should not throw exception #2308 (andygrove)
fix: TakeOrderedAndProjectExec is not reporting all fallback reasons #2323 (kazuyukitanimura)
fix: Fallback length function with binary input #2349 (wForget)
Performance related:
perf: Optimize
AvgDecimalGroupsAccumulator#1893 (leung-ming)perf: Optimize
SumDecimalGroupsAccumulator::update_single#2069 (leung-ming)perf: Avoid FFI copy in
ScanExecwhen reading data from exchanges #2268 (andygrove)
Implemented enhancements:
feat: Add from_unixtime support #1943 (kazuyukitanimura)
feat: randn expression support #2010 (akupchinskiy)
feat: monotonically_increasing_id and spark_partition_id implementation #2037 (akupchinskiy)
feat: support
map_entries#2059 (comphead)feat: Support Array Literal #2057 (comphead)
feat: Add new trait for operator serde #2115 (andygrove)
feat: limit with offset support #2070 (akupchinskiy)
feat: Include scan implementation name in CometScan nodeName #2141 (andygrove)
feat: Add config option to log fallback reasons #2154 (andygrove)
feat: [iceberg] Enable Comet shuffle in Iceberg diff #2205 (andygrove)
feat: Improve shuffle fallback reporting #2194 (andygrove)
feat: Reset data buf of NativeBatchDecoderIterator on close #2235 (wForget)
feat: Improve fallback mechanism for ANSI mode #2211 (andygrove)
feat: Support hdfs with OpenDAL #2244 (wForget)
feat: Ignore fallback info for command execs #2297 (wForget)
feat: Improve some confusing fallback reasons #2301 (wForget)
feat: Make supported hadoop filesystem schemes configurable #2272 (wForget)
feat: [1941-Part1]: Introduce map-sort scalar function #2262 (rishvin)
feat: [iceberg] delete rows support using selection vectors #2346 (parthchandra)
Documentation updates:
docs: Update benchmark results for 0.9.0 #1959 (andygrove)
doc: Add comment about local clippy run before submitting a pull request #1961 (akupchinskiy)
docs: Minor improvements to Spark SQL test docs #1980 (andygrove)
docs: Update Maven links for 0.9.0 release #1988 (andygrove)
docs: Documentation updates for 0.9.0 release #1981 (andygrove)
docs: Add guide showing comparison between Comet and Gluten #2012 (andygrove)
docs: Remove legacy comment in docs #2022 (andygrove)
docs: Update Gluten comparision to clarify that Velox is open-source #2043 (andygrove)
docs: Improve Gluten comparison based on feedback from the community #2048 (andygrove)
docs: added a missing export into the plan stability section #2071 (akupchinskiy)
doc: Added documentation for supported map functions #2074 (codetyri0n)
doc: Alternative way to start Spark Master to run benchmarks #2072 (comphead)
docs: Update to support try arithmetic functions #2143 (coderfender)
doc: update macos standalone spark start instructions #2103 (comphead)
docs: Update confs to bypass Iceberg Spark issues #2166 (hsiang-c)
docs: Add Roadmap #2191 (andygrove)
docs: Update installation guide for 0.9.1 #2230 (andygrove)
docs: Publish version-specific user guides #2269 (andygrove)
docs: Fix issues with publishing user guide for older Comet versions #2284 (andygrove)
docs: Move user guide docs into /user-guide/latest #2318 (andygrove)
docs: Add manual redirects from old pages that no longer exist #2317 (andygrove)
docs: Fix broken links and other Sphinx warnings #2320 (andygrove)
docs: Use
sphinx-reredirectsfor redirects #2324 (andygrove)docs: Add note about Root CA Certificate location with native scans #2325 (andygrove)
docs: Stop hard-coding Comet version in docs #2326 (andygrove)
docs: Update supported expressions and operators in user guide #2327 (andygrove)
docs: Update Iceberg docs for 0.10.0 release #2355 (hsiang-c)
Other:
chore: Start 0.10.0 development #1958 (andygrove)
build: Fix release dockerfile #1960 (andygrove)
test: Run Iceberg Spark tests only when PR title contains [iceberg] #1976 (hsiang-c)
chore: Reuse comet allocator #1973 (EmilyMatt)
chore: update
CopyExecwithmaintains_input_order,supports_limit_pushdownandcardinality_effect#1979 (rluvaton)chore: extract CreateArray from QueryPlanSerde #1991 (tglanz)
chore: use DF scalar functions for StartsWith, EndsWith, Contains, DF LikeExpr #1887 (mbutrovich)
refactor: standardize div_ceil #1999 (tglanz)
Feat: support map_from_arrays #1932 (kazantsev-maksim)
chore: Implement BloomFilterMightContain as a ScalarUDFImpl #1954 (tglanz)
chore: Drop support for RightSemi and RightAnti join types #1935 (dharanad)
minor: Refactor to reduce duplicate serde code #2011 (andygrove)
chore: Introduce ANSI support for remainder operation #1971 (rishvin)
chore: Improve process for generating dynamic content into documentation #2017 (andygrove)
minor: Refactor to move some shuffle-related logic from
QueryPlanSerdetoCometExecRule#2015 (andygrove)chore: Add benchmarking scripts #2025 (andygrove)
chore: Add scripts for running benchmark based on TPC-DS #2042 (andygrove)
Chore: Improve array contains test coverage #2030 (kazantsev-maksim)
fix : cast_operands_to_decimal_type_to_fix_arithmetic_overflow #1996 (coderfender)
chore: Add scripts for running benchmarks with Blaze #2050 (andygrove)
chore: migrate to DF 49.0.0 #2040 (comphead)
chore: Refactor aggregate serde to be consistent with other expression serde #2055 (andygrove)
Chore: implement string_space as ScalarUDFImpl #2041 (kazantsev-maksim)
docs : Change notes for
IntegralDivide#2054 (coderfender)Chore: refactor Comparison out of QueryPlanSerde #2028 (CuteChuanChuan)
chore: Use Datafusion’s Sha2 and remove Comet’s implementation. #2063 (rishvin)
chore: Adding dependabot #2076 (comphead)
chore: Fix clippy issues for Rust 1.89.0 #2082 (andygrove)
chore: Refactor string expression serde, part 1 #2068 (andygrove)
chore: Use
chrfunction from datafusion-spark #2080 (andygrove)minor: CometBuffer code cleanup #2090 (andygrove)
chore: Refactor string expression serde, part 2 #2097 (andygrove)
chore: create copy of fs-hdfs #2062 (parthchandra)
Chore: refactor datetime related expressions out of QueryPlanSerde #2085 (CuteChuanChuan)
chore(deps): bump actions/checkout from 3 to 4 #2104 (dependabot[bot])
chore(deps): bump libc from 0.2.174 to 0.2.175 in /native #2107 (dependabot[bot])
chore(deps): bump assertables from 9.8.1 to 9.8.2 in /native #2108 (dependabot[bot])
chore: Update dependabot label #2110 (mbutrovich)
chore: Move
stringDecode()toCommonStringExprstrait #2111 (peter-toth)chore(deps): bump uuid from 0.8.2 to 1.17.0 in /native #2106 (dependabot[bot])
chore(deps): bump actions/download-artifact from 4 to 5 #2109 (dependabot[bot])
chore(deps): bump tokio from 1.47.0 to 1.47.1 in /native #2112 (dependabot[bot])
chore(deps): bump actions/setup-java from 3 to 4 #2105 (dependabot[bot])
chore(deps): bump the proto group in /native with 2 updates #2113 (dependabot[bot])
chore: Add type parameter to
CometExpressionSerde#2114 (peter-toth)chore(deps): bump cc from 1.2.30 to 1.2.32 in /native #2123 (dependabot[bot])
chore(deps): bump bindgen from 0.64.0 to 0.69.5 in /native #2124 (dependabot[bot])
chore(deps): bump aws-credential-types from 1.2.4 to 1.2.5 in /native #2125 (dependabot[bot])
chore(deps): bump actions/checkout from 4 to 5 #2126 (dependabot[bot])
chore: fix
QueryPlanSerdemerge error #2127 (comphead)chore(deps): bump slab from 0.4.10 to 0.4.11 in /native #2128 (dependabot[bot])
fix : implement_try_eval_mode_arithmetic #2073 (coderfender)
chore: Simplify approach to avoiding memory corruption due to buffer reuse #2156 (andygrove)
chore: upgrade to DataFusion 49.0.1 #2077 (mbutrovich)
chore: CometExecRule code cleanup #2159 (andygrove)
chore: Update
CometTestBaseto stop setting the scan implementation tonative_comet#2176 (andygrove)trivial: remove unnecessary clone() #2066 (isimluk)
chore: Pass Spark configs to native
createPlan#2180 (andygrove)(feat) add support for ArrayMin scalar function #1944 (dharanad)
chore: Upgrade to 49.0.2 #2223 (comphead)
chore(deps): bump bindgen from 0.69.5 to 0.72.0 in /native #2222 (dependabot[bot])
chore: move Round serde into object #2237 (andygrove)
chore: Improve expression fallback reporting #2240 (andygrove)
chore: Update stability suite to use
autoscan instead ofnative_comet#2178 (andygrove)chore: Improve documentation for
CometBatchIteratorand fix a potential issue #2168 (andygrove)chore: Fix
array_intersecttest #2246 (comphead)chore(deps): bump actions/checkout from 4 to 5 #2229 (dependabot[bot])
chore(deps): bump actions/setup-java from 4 to 5 #2225 (dependabot[bot])
chore: Introduce
strict-warningprofile for Scala #2254 (comphead)chore: fix struct to string test for
native_iceberg_compat#2253 (comphead)chore: Add type parameter to CometAggregateExpressionSerde #2249 (andygrove)
Feat: Impl array flatten func #2039 (kazantsev-maksim)
Chore: Refactor serde for math expressions #2259 (kazantsev-maksim)
chore: Refactor serde for more array and struct expressions #2257 (andygrove)
chore: Refactor remaining predicate expression serde #2265 (andygrove)
chore(deps): bump procfs from 0.17.0 to 0.18.0 in /native #2278 (dependabot[bot])
chore(deps): bump cc from 1.2.34 to 1.2.35 in /native #2277 (dependabot[bot])
chore(deps): bump bindgen from 0.72.0 to 0.72.1 in /native #2274 (dependabot[bot])
chore(deps): bump aws-credential-types from 1.2.5 to 1.2.6 in /native #2275 (dependabot[bot])
minor: Remove useless ENABLE_COMET_SHUFFLE env #2280 (wForget)
chore: Refactor serde for conditional expressions #2266 (andygrove)
chore(deps): bump mimalloc from 0.1.47 to 0.1.48 in /native #2276 (dependabot[bot])
chore: docker publish and docs build only for apache repo #2289 (wForget)
minor: Reduce misleading fallback warnings #2283 (andygrove)
chore: Refactor
Castserde to avoid code duplication #2242 (andygrove)chore: Refactor
hex/unhexSerDe to avoid code duplication #2287 (hsiang-c)minor: Improve exception message for unimplemented CometVector methods #2291 (andygrove)
chore: Align sort constraints w/
arrow-rs#2279 (hsiang-c)chore: Collect fallback reasons for spark sql tests #2313 (wForget)
chore: Refactor serde for named expressions
aliasandattributeReference#2290 (andygrove)chore(deps): bump log4rs from 1.3.0 to 1.4.0 in /native #2334 (dependabot[bot])
chore(deps): bump twox-hash from 2.1.1 to 2.1.2 in /native #2335 (dependabot[bot])
chore(deps): bump actions/setup-python from 5 to 6 #2331 (dependabot[bot])
chore(deps): bump actions/download-artifact from 4 to 5 #2332 (dependabot[bot])
chore(deps): bump cc from 1.2.35 to 1.2.36 in /native #2337 (dependabot[bot])
chore(deps): bump log from 0.4.27 to 0.4.28 in /native #2333 (dependabot[bot])
build: Specify SPARK_LOCAL_HOSTNAME to fix CI failures #2353 (andygrove)
chore: [branch-0.10] Bump version to 0.10.0 #2356 (andygrove)
Credits#
Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor.
75 Andy Grove
27 dependabot[bot]
11 Oleks V
9 Zhen Wang
7 hsiang-c
5 Artem Kupchinskiy
5 B Vadlamani
5 Kazantsev Maksim
5 Matt Butrovich
5 Parth Chandra
4 Rishab Joshi
3 Peter Toth
3 Tal Glanzman
2 Dharan Aditya
2 Huaxin Gao
2 KAZUYUKI TANIMURA
2 Leung Ming
2 Raz Luvaton
2 Yu-Chuan Hung
1 Asura7969
1 Emily Matheys
1 K.I. (Dennis) Jung
1 Kristin Cowalcijk
1 Peter Nguyen
1 codetyri0n
1 Šimon Lukašík
Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.