Apache Spark
- tags
- Apache Foundation, Tool
Apache Spark™ is built on an advanced distributed SQL engine for large-scale data
The most widely-used engine for scalable computing
Thousands of companies, including 80% of the Fortune 500, use Apache Spark™. Over 2,000 contributors to the open source project from industry and academia.
Key features #
Batch/streaming data #
Unify the processing of your data in batches and real-time streaming, using your preferred language: Python, SQL, Scala, Java or R.
SQL analytics #
Execute fast, distributed ANSI SQL queries for dashboarding and ad-hoc reporting. Runs faster than most data warehouses.
Data science at scale #
Perform Exploratory Data Analysis (EDA) on petabyte-scale data without having to resort to downsampling
Machine learning #
Train machine learning algorithms on a laptop and use the same code to scale to fault-tolerant clusters of thousands of machines.
Ecosystem #
Apache Spark™ integrates with your favorite frameworks, helping to scale them to thousands of machines.

Spark SQL engine: under the hood #
Apache Spark™ is built on an advanced distributed SQL engine for large-scale data