DataFusion Comet 0.10.0 Changelog#

This release consists of 183 commits from 26 contributors. See credits at the end of this changelog for more information.

Fixed bugs:

  • fix: [Iceberg] Fix decimal corruption #1985 (andygrove)

  • fix: broken link in development.md #2024 (petern48)

  • fix: [iceberg] Add LogicalTypeAnnotation in ParquetColumnSpec #2000 (huaxingao)

  • fix: hdfs read into buffer fully #2031 (parthchandra)

  • fix: Refactor arithmetic serde and fix correctness issues with EvalMode::TRY #2018 (andygrove)

  • fix: clean up [iceberg] integration APIs #2032 (huaxingao)

  • fix: zero Arrow Array offset before sending across FFI #2052 (mbutrovich)

  • fix: [iceberg] more fixes for Iceberg integration APIs. #2078 (parthchandra)

  • fix: Add support for StringDecode in Spark 4.0.0 #2075 (peter-toth)

  • fix: Avoid double free in CometUnifiedShuffleMemoryAllocator #2122 (andygrove)

  • fix: Remove duplicate serde code #2098 (andygrove)

  • fix: Improve logic for determining when an UnpackOrDeepCopy is needed #2142 (andygrove)

  • fix: Add CopyExec to inputs to SortMergeJoinExec #2155 (andygrove)

  • fix: Fix repeatedly url-decode path when reading parquet from s3 using native parquet reader #2138 (Kontinuation)

  • fix: [iceberg] Switch to OSS Spark and run Iceberg Spark tests in parallel #1987 (hsiang-c)

  • fix: [iceberg] Fall back to spark for schemas with empty structs #2204 (andygrove)

  • fix: Fix failing TPC-DS workflow in PR CI runs #2207 (andygrove)

  • fix: [iceberg] order query result deterministically #2208 (hsiang-c)

  • fix: use spark.comet.batchSize instead of conf.arrowMaxRecordsPerBatch for data that is coming from Java #2196 (rluvaton)

  • fix: if expr nullable #2217 (Asura7969)

  • fix: Support auto scan mode with Spark 4.0.0 #1975 (andygrove)

  • fix: Make Sha2 fallback message more user-friendly #2213 (rishvin)

  • fix: separate type checking for CometExchange and CometColumnarExchange #2241 (mbutrovich)

  • fix: Fix potential resource leak in native shuffle block reader #2247 (andygrove)

  • fix: Remove unreachable code in CometScanRule #2252 (andygrove)

  • fix: Fall back to native_comet for encrypted Parquet scans #2250 (andygrove)

  • fix: Fall back to native_comet when object store not supported by native_iceberg_compat #2251 (andygrove)

  • fix: split expr.proto file (new) #2267 (kination)

  • fix: handle cast to dictionary vector introduced by case when #2044 (parthchandra)

  • fix: Remove check for custom S3 endpoints #2288 (andygrove)

  • fix: implement lazy evaluation in Coalesce function #2270 (coderfender)

  • fix: Update benchmarking scripts #2293 (andygrove)

  • fix: Fix regression in NativeConfigSuite #2299 (andygrove)

  • fix: Validating object store configs should not throw exception #2308 (andygrove)

  • fix: TakeOrderedAndProjectExec is not reporting all fallback reasons #2323 (kazuyukitanimura)

  • fix: Fallback length function with binary input #2349 (wForget)

Performance related:

  • perf: Optimize AvgDecimalGroupsAccumulator #1893 (leung-ming)

  • perf: Optimize SumDecimalGroupsAccumulator::update_single #2069 (leung-ming)

  • perf: Avoid FFI copy in ScanExec when reading data from exchanges #2268 (andygrove)

Implemented enhancements:

  • feat: Add from_unixtime support #1943 (kazuyukitanimura)

  • feat: randn expression support #2010 (akupchinskiy)

  • feat: monotonically_increasing_id and spark_partition_id implementation #2037 (akupchinskiy)

  • feat: support map_entries #2059 (comphead)

  • feat: Support Array Literal #2057 (comphead)

  • feat: Add new trait for operator serde #2115 (andygrove)

  • feat: limit with offset support #2070 (akupchinskiy)

  • feat: Include scan implementation name in CometScan nodeName #2141 (andygrove)

  • feat: Add config option to log fallback reasons #2154 (andygrove)

  • feat: [iceberg] Enable Comet shuffle in Iceberg diff #2205 (andygrove)

  • feat: Improve shuffle fallback reporting #2194 (andygrove)

  • feat: Reset data buf of NativeBatchDecoderIterator on close #2235 (wForget)

  • feat: Improve fallback mechanism for ANSI mode #2211 (andygrove)

  • feat: Support hdfs with OpenDAL #2244 (wForget)

  • feat: Ignore fallback info for command execs #2297 (wForget)

  • feat: Improve some confusing fallback reasons #2301 (wForget)

  • feat: Make supported hadoop filesystem schemes configurable #2272 (wForget)

  • feat: [1941-Part1]: Introduce map-sort scalar function #2262 (rishvin)

  • feat: [iceberg] delete rows support using selection vectors #2346 (parthchandra)

Documentation updates:

  • docs: Update benchmark results for 0.9.0 #1959 (andygrove)

  • doc: Add comment about local clippy run before submitting a pull request #1961 (akupchinskiy)

  • docs: Minor improvements to Spark SQL test docs #1980 (andygrove)

  • docs: Update Maven links for 0.9.0 release #1988 (andygrove)

  • docs: Documentation updates for 0.9.0 release #1981 (andygrove)

  • docs: Add guide showing comparison between Comet and Gluten #2012 (andygrove)

  • docs: Remove legacy comment in docs #2022 (andygrove)

  • docs: Update Gluten comparision to clarify that Velox is open-source #2043 (andygrove)

  • docs: Improve Gluten comparison based on feedback from the community #2048 (andygrove)

  • docs: added a missing export into the plan stability section #2071 (akupchinskiy)

  • doc: Added documentation for supported map functions #2074 (codetyri0n)

  • doc: Alternative way to start Spark Master to run benchmarks #2072 (comphead)

  • docs: Update to support try arithmetic functions #2143 (coderfender)

  • doc: update macos standalone spark start instructions #2103 (comphead)

  • docs: Update confs to bypass Iceberg Spark issues #2166 (hsiang-c)

  • docs: Add Roadmap #2191 (andygrove)

  • docs: Update installation guide for 0.9.1 #2230 (andygrove)

  • docs: Publish version-specific user guides #2269 (andygrove)

  • docs: Fix issues with publishing user guide for older Comet versions #2284 (andygrove)

  • docs: Move user guide docs into /user-guide/latest #2318 (andygrove)

  • docs: Add manual redirects from old pages that no longer exist #2317 (andygrove)

  • docs: Fix broken links and other Sphinx warnings #2320 (andygrove)

  • docs: Use sphinx-reredirects for redirects #2324 (andygrove)

  • docs: Add note about Root CA Certificate location with native scans #2325 (andygrove)

  • docs: Stop hard-coding Comet version in docs #2326 (andygrove)

  • docs: Update supported expressions and operators in user guide #2327 (andygrove)

  • docs: Update Iceberg docs for 0.10.0 release #2355 (hsiang-c)

Other:

  • chore: Start 0.10.0 development #1958 (andygrove)

  • build: Fix release dockerfile #1960 (andygrove)

  • test: Run Iceberg Spark tests only when PR title contains [iceberg] #1976 (hsiang-c)

  • chore: Reuse comet allocator #1973 (EmilyMatt)

  • chore: update CopyExec with maintains_input_order, supports_limit_pushdown and cardinality_effect #1979 (rluvaton)

  • chore: extract CreateArray from QueryPlanSerde #1991 (tglanz)

  • chore: use DF scalar functions for StartsWith, EndsWith, Contains, DF LikeExpr #1887 (mbutrovich)

  • refactor: standardize div_ceil #1999 (tglanz)

  • Feat: support map_from_arrays #1932 (kazantsev-maksim)

  • chore: Implement BloomFilterMightContain as a ScalarUDFImpl #1954 (tglanz)

  • chore: Drop support for RightSemi and RightAnti join types #1935 (dharanad)

  • minor: Refactor to reduce duplicate serde code #2011 (andygrove)

  • chore: Introduce ANSI support for remainder operation #1971 (rishvin)

  • chore: Improve process for generating dynamic content into documentation #2017 (andygrove)

  • minor: Refactor to move some shuffle-related logic from QueryPlanSerde to CometExecRule #2015 (andygrove)

  • chore: Add benchmarking scripts #2025 (andygrove)

  • chore: Add scripts for running benchmark based on TPC-DS #2042 (andygrove)

  • Chore: Improve array contains test coverage #2030 (kazantsev-maksim)

  • fix : cast_operands_to_decimal_type_to_fix_arithmetic_overflow #1996 (coderfender)

  • chore: Add scripts for running benchmarks with Blaze #2050 (andygrove)

  • chore: migrate to DF 49.0.0 #2040 (comphead)

  • chore: Refactor aggregate serde to be consistent with other expression serde #2055 (andygrove)

  • Chore: implement string_space as ScalarUDFImpl #2041 (kazantsev-maksim)

  • docs : Change notes for IntegralDivide #2054 (coderfender)

  • Chore: refactor Comparison out of QueryPlanSerde #2028 (CuteChuanChuan)

  • chore: Use Datafusion’s Sha2 and remove Comet’s implementation. #2063 (rishvin)

  • chore: Adding dependabot #2076 (comphead)

  • chore: Fix clippy issues for Rust 1.89.0 #2082 (andygrove)

  • chore: Refactor string expression serde, part 1 #2068 (andygrove)

  • chore: Use chr function from datafusion-spark #2080 (andygrove)

  • minor: CometBuffer code cleanup #2090 (andygrove)

  • chore: Refactor string expression serde, part 2 #2097 (andygrove)

  • chore: create copy of fs-hdfs #2062 (parthchandra)

  • Chore: refactor datetime related expressions out of QueryPlanSerde #2085 (CuteChuanChuan)

  • chore(deps): bump actions/checkout from 3 to 4 #2104 (dependabot[bot])

  • chore(deps): bump libc from 0.2.174 to 0.2.175 in /native #2107 (dependabot[bot])

  • chore(deps): bump assertables from 9.8.1 to 9.8.2 in /native #2108 (dependabot[bot])

  • chore: Update dependabot label #2110 (mbutrovich)

  • chore: Move stringDecode() to CommonStringExprs trait #2111 (peter-toth)

  • chore(deps): bump uuid from 0.8.2 to 1.17.0 in /native #2106 (dependabot[bot])

  • chore(deps): bump actions/download-artifact from 4 to 5 #2109 (dependabot[bot])

  • chore(deps): bump tokio from 1.47.0 to 1.47.1 in /native #2112 (dependabot[bot])

  • chore(deps): bump actions/setup-java from 3 to 4 #2105 (dependabot[bot])

  • chore(deps): bump the proto group in /native with 2 updates #2113 (dependabot[bot])

  • chore: Add type parameter to CometExpressionSerde #2114 (peter-toth)

  • chore(deps): bump cc from 1.2.30 to 1.2.32 in /native #2123 (dependabot[bot])

  • chore(deps): bump bindgen from 0.64.0 to 0.69.5 in /native #2124 (dependabot[bot])

  • chore(deps): bump aws-credential-types from 1.2.4 to 1.2.5 in /native #2125 (dependabot[bot])

  • chore(deps): bump actions/checkout from 4 to 5 #2126 (dependabot[bot])

  • chore: fix QueryPlanSerde merge error #2127 (comphead)

  • chore(deps): bump slab from 0.4.10 to 0.4.11 in /native #2128 (dependabot[bot])

  • fix : implement_try_eval_mode_arithmetic #2073 (coderfender)

  • chore: Simplify approach to avoiding memory corruption due to buffer reuse #2156 (andygrove)

  • chore: upgrade to DataFusion 49.0.1 #2077 (mbutrovich)

  • chore: CometExecRule code cleanup #2159 (andygrove)

  • chore: Update CometTestBase to stop setting the scan implementation to native_comet #2176 (andygrove)

  • trivial: remove unnecessary clone() #2066 (isimluk)

  • chore: Pass Spark configs to native createPlan #2180 (andygrove)

  • (feat) add support for ArrayMin scalar function #1944 (dharanad)

  • chore: Upgrade to 49.0.2 #2223 (comphead)

  • chore(deps): bump bindgen from 0.69.5 to 0.72.0 in /native #2222 (dependabot[bot])

  • chore: move Round serde into object #2237 (andygrove)

  • chore: Improve expression fallback reporting #2240 (andygrove)

  • chore: Update stability suite to use auto scan instead of native_comet #2178 (andygrove)

  • chore: Improve documentation for CometBatchIterator and fix a potential issue #2168 (andygrove)

  • chore: Fix array_intersect test #2246 (comphead)

  • chore(deps): bump actions/checkout from 4 to 5 #2229 (dependabot[bot])

  • chore(deps): bump actions/setup-java from 4 to 5 #2225 (dependabot[bot])

  • chore: Introduce strict-warning profile for Scala #2254 (comphead)

  • chore: fix struct to string test for native_iceberg_compat #2253 (comphead)

  • chore: Add type parameter to CometAggregateExpressionSerde #2249 (andygrove)

  • Feat: Impl array flatten func #2039 (kazantsev-maksim)

  • Chore: Refactor serde for math expressions #2259 (kazantsev-maksim)

  • chore: Refactor serde for more array and struct expressions #2257 (andygrove)

  • chore: Refactor remaining predicate expression serde #2265 (andygrove)

  • chore(deps): bump procfs from 0.17.0 to 0.18.0 in /native #2278 (dependabot[bot])

  • chore(deps): bump cc from 1.2.34 to 1.2.35 in /native #2277 (dependabot[bot])

  • chore(deps): bump bindgen from 0.72.0 to 0.72.1 in /native #2274 (dependabot[bot])

  • chore(deps): bump aws-credential-types from 1.2.5 to 1.2.6 in /native #2275 (dependabot[bot])

  • minor: Remove useless ENABLE_COMET_SHUFFLE env #2280 (wForget)

  • chore: Refactor serde for conditional expressions #2266 (andygrove)

  • chore(deps): bump mimalloc from 0.1.47 to 0.1.48 in /native #2276 (dependabot[bot])

  • chore: docker publish and docs build only for apache repo #2289 (wForget)

  • minor: Reduce misleading fallback warnings #2283 (andygrove)

  • chore: Refactor Cast serde to avoid code duplication #2242 (andygrove)

  • chore: Refactor hex/unhex SerDe to avoid code duplication #2287 (hsiang-c)

  • minor: Improve exception message for unimplemented CometVector methods #2291 (andygrove)

  • chore: Align sort constraints w/ arrow-rs #2279 (hsiang-c)

  • chore: Collect fallback reasons for spark sql tests #2313 (wForget)

  • chore: Refactor serde for named expressions alias and attributeReference #2290 (andygrove)

  • chore(deps): bump log4rs from 1.3.0 to 1.4.0 in /native #2334 (dependabot[bot])

  • chore(deps): bump twox-hash from 2.1.1 to 2.1.2 in /native #2335 (dependabot[bot])

  • chore(deps): bump actions/setup-python from 5 to 6 #2331 (dependabot[bot])

  • chore(deps): bump actions/download-artifact from 4 to 5 #2332 (dependabot[bot])

  • chore(deps): bump cc from 1.2.35 to 1.2.36 in /native #2337 (dependabot[bot])

  • chore(deps): bump log from 0.4.27 to 0.4.28 in /native #2333 (dependabot[bot])

  • build: Specify SPARK_LOCAL_HOSTNAME to fix CI failures #2353 (andygrove)

  • chore: [branch-0.10] Bump version to 0.10.0 #2356 (andygrove)

Credits#

Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor.

    75	Andy Grove
    27	dependabot[bot]
    11	Oleks V
     9	Zhen Wang
     7	hsiang-c
     5	Artem Kupchinskiy
     5	B Vadlamani
     5	Kazantsev Maksim
     5	Matt Butrovich
     5	Parth Chandra
     4	Rishab Joshi
     3	Peter Toth
     3	Tal Glanzman
     2	Dharan Aditya
     2	Huaxin Gao
     2	KAZUYUKI TANIMURA
     2	Leung Ming
     2	Raz Luvaton
     2	Yu-Chuan Hung
     1	Asura7969
     1	Emily Matheys
     1	K.I. (Dennis) Jung
     1	Kristin Cowalcijk
     1	Peter Nguyen
     1	codetyri0n
     1	Šimon Lukašík

Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.