Spark Associate Developer Certification Guide
Labinot Vila
Posted on March 19, 2024
This content is all about what is needed to pass the Databricks: Spark Associate Developer
exam.
Books
Spark: The Definitive Guide
Learning Spark: 2nd Edition
The Data Engineering's Guide to Apache Spark
Lectures
Youtube
Advanced Apache Spark Training - Sameer Farooqui (Databricks)
Apache Spark Core—Deep Dive
Udemy
Apache Spark 3 - Beyond Basics
Apache Spark 3 - Databricks Certified
Exams
Databricks Apache Spark 3.0 Dev Certification - Tests(Scala)
Databricks Certified Apache Spark 3.0 TESTS (Scala & Python)
Databricks Certified Developer for Spark 3.0 Practice Exams
PDF Exams
Databricks Certified Developer for Spark 3.0 Practice Exams
PDF Exams
More Demo Dumps
Topics touched on the exam
- When does a Spark application fail? (when executor fails, when driver fails, when data is not fully cached, etc.)
- What is the most granular unit in the Spark hierarchy? (jobs, stages, tasks, etc.)
- What does NOT help in optimizing a Spark application? (related to partitions, column merging, etc.)
- What happens if there are more slots than tasks to process in a worker node? (resources are not fully utilized, etc.)
- What is a task? (a unit of work that can fit into an executor, a unit of work that can fit into a machine, etc.)
- What is a job?
- What is the difference between actions and transformations?
- Which one of Dataset API methods is most likely to invoke a shuffle? (union, groupBy, filter, etc.)
- How many % of the following code will cache the dataframe? (a .show() is called on a Scala range)
- How many jobs will the following code create? (a dataframe reading and schema infering)
- A wide partitions exchanges data between which units? (partitions, executors, clusters, etc.)
- We want to generate 25 partitions after a join, what is the right configuration to use?
- What are valid Spark deployment modes? (YARN, Local, Standalone, etc.)
- Which of the options helps garbage collecting? (increasing java heap space, serialization or deserialization, etc.)
- Dataset API Questions
- Split function
- Explode function
- Joins (inner, left, crossJoin and anti)
- Renaming column
- Overwriting column
- Filtering with multiple conditions
- Using where vs using filter difference
- Date and time manipulation (to and from unix, formatting, etc.)
- Sorting asc and desc with and without nulls
- Literals
- Repartition and coalesce (more than 2 questions)
- UDFs
- Aggregate functions (dense rank and rank)
- Printing schema
- Finding transformations and actions
- Collecting a dataset, extracting values and casting
- Casting columns of a dataset
- Dataset Reading and Writing
- Reading a raw CSV file
- Reading a CSV file with schema and with separators
- Read and write modes
- Writing and overwriting a parquet
- Partitioning by a column and writing
Do not rely on documentation online!
Posted on March 19, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.