Katherine Sanders

Camp Counselor

Apache Spark Core Concepts: Introduction to Distributed Data Processing

Event Logo

Monday, January 16, 2023 - 7:00 PM UTC, for 1 hour.

Regular, 60 minute presentation

Room: Campsite 4

spark
pyspark
big data
distributed computing
optimization

Big data is only getting bigger, and being able to make quick, data-driven decisions at scale is more important than ever. That’s why thousands of organizations in both industry and academia use Apache Spark for scalable computing. This talk introduces Spark concepts in an approachable, visual manner that will leave you with a strong foundation for using this powerful data processing and analytics engine.

Prerequisites

None - this talk is designed to be approachable by everyone.

Take Aways

  • Learn strategies for optimizing Spark jobs
  • Visualize data partitions, data shuffling, drivers & executors, and the layers of spark computation
  • Gain a firm understanding of parallel processing frameworks like Apache Spark
favorited by:
Kristjan Roosild Matt Payne Caleb Jenkins Olena Kutsenko Joel Lord