Apache DataFusion Comet#
Apache DataFusion Comet
A high-performance accelerator for Apache Spark
Runs your existing Spark queries on the Apache DataFusion native engine, no code changes required. Also accelerates Parquet scans for Apache Iceberg.
# Download the Comet plugin for your Spark / Scala version $ export COMET_JAR=comet-spark-spark4.1_2.13-0.16.0.jar # Launch Spark with Comet enabled — drop-in, no code changes $ $SPARK_HOME/bin/spark-shell \ --jars $COMET_JAR \ --conf spark.driver.extraClassPath=$COMET_JAR \ --conf spark.executor.extraClassPath=$COMET_JAR \ --conf spark.plugins=org.apache.spark.CometPlugin \ --conf spark.shuffle.manager=org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager \ --conf spark.memory.offHeap.enabled=true \ --conf spark.memory.offHeap.size=4g // Your existing Spark queries — now executed natively via DataFusion scala> spark.sql("SELECT category, COUNT(*) FROM events GROUP BY category").show() scala> ▍
Run Spark Queries at DataFusion Speeds
Comet delivers a performance speedup for many queries, enabling faster data processing and shorter time-to-insights.
The chart below shows Comet accelerating TPC-DS @ 1 TB. See the Comet Benchmarking Guide for the full per-query breakdown and reproduction methodology.
Spark Compatibility
100% compatibility with supported Spark versions.
Comet aims for 100% compatibility with all supported versions of Apache Spark, allowing you to integrate Comet into your existing Spark deployments and workflows seamlessly. With no code changes required, you can immediately harness the benefits of Comet's acceleration capabilities without disrupting your Spark applications. The Comet extension automatically detects unsupported features and falls back to the Spark engine.
Architecture
Tight integration with Apache DataFusion.
The diagram below shows an overview of Comet's architecture: how the Comet plugin intercepts Spark physical plans, translates supported operators into a protocol-buffer representation, and hands them to the Apache DataFusion native engine for execution.
Getting Started
To get started with Apache DataFusion Comet, follow the installation instructions. Join the DataFusion Slack and Discord channels to connect with other users, ask questions, and share your experiences with Comet.
Contributing
We welcome contributions from the community to help improve and enhance Apache DataFusion Comet. Whether it's fixing bugs, adding new features, writing documentation, or optimizing performance, your contributions are invaluable in shaping the future of Comet. Check out our contributor guide to get started.